Hortonworks Hadoop 2.X Administrator Certification

Hi All,

I have cleared this certification as well yesterday (1/27/2015) with a score of 78%. Exam was little challenging. Guidelines and Tips/Tricks are coming soon 🙂

 

Core Topics to be covered in the Examination. Please read all the references.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/bk_releasenotes_hdp_2.0/content/ch_relnotes-hdp2.0.0.1_3.html
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_using_Ambari_book/content/ambari-chap1-6.html
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.0/bk_using_Ambari_book/bk_using_Ambari_book-20130114.pdf
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_cluster-planning-guide/bk_cluster-planning-guide-20141028.pdf
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_cluster-planning-guide/content/ch02s02.html
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_reference/content/reference_chap2.html
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_installing_manually_book/bk_installing_manually_book-20141028.pdf
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/upgrade-4-1.html
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.7/bk_user-guide/content/user-guide-hdfs-nfs.html
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.3/bk_dataintegration/content/ch_using-oozie.html
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_performance_tuning/content/ch01s01s01.html
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_system-admin-guide/content/enabling_capacity_scheduler.html
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_system-admin-guide/content/ch_hadoop-ha-6.html
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.7/bk_using_Ambari_book/content/ambari-kerb-2.x.html
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.1.1/bk_system-admin-guide/content/ch_acls-on-hdfs.html
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/index.html

http://hortonworks.com/blog/multi-tenancy-in-hdp-2-0-capacity-scheduler-and-yarn/

 

Also attaching Some Important Documents from Hortonworks.  Do a Sandbox level hands on before appearing for the exam.  Exam is more focused on Practicals.  You must be having good knowledge on hdfs-site.xml, mapred-site.xml, hive-site.xml,yarn-site.xml, capactiy-scheduler.xml,Rack Awareness Shell Scripts, Topology Script, Balancer Scripts ,etc. Please go through all the materials.

 

bk_system-admin-guide-20141028

bk_user-guide-20141028

bk_using_Ambari_book-20140715

bk_webhdfs-20141028

bk_cluster-planning-guide-20141028

bk_getting-started-guide-20141028

bk_installing_manually_book-20141028

bk_performance_tuning-20140718

bk_reference-20140702

bk_Security_Guide-20141028

 

Good Luck!!!!…

 

 

Hortonworks Hadoop 2.0 Java Developer Certification

Hortonworks Certification Tips and guidelines

Certification 2 – Hortonworks Certified Apache Hadoop Developer (Java)

I successfully completed this certification on Nov 24, 2014 with a passing score of 90%.  I am sharing the experience I gained on this certification. I have given all the required materials what I have gone through for this certification.  Please have some sandbox level hands on experience on these topics before you appear for the examination.

Exam Format

Course Curriculum

Objective 1.1 – HDFS and MapReduce

  • Understand how the NameNode maintains the filesystem metadata
  • Understand how data is stored in HDFS
  • Understand the WebHDFS commands
  • Understand the hadoop fs command
  • Understand the relationship between NameNodes and DataNodes
  • Understand the relationship between NameNodes and namespaces in Hadoop 2.0
  • Understand how HDFS Federation works in Hadoop 2.0
  • Understand the various components of NameNode HA in Hadoop 2.0
  • Understand the architecture of MapReduce
  • Understand the various phases of a MapReduce job
  • Demonstrate how key/value pairs flow through a MapReduce job
  • Write Java Mapper and Reducer classes
  • Use the org.apache.hadoop.mapreduce.Job class to configure a MapReduce job
  • Use the TextInputFormat and TextOutputFormat classes
  • Write a custom InputFormat
  • Configure a Combiner
  • Define a custom Combiner
  • Define a custom Partitioner
  • Use the Distributed Cache
  • Use the CompositeInputFormat class

Objective 2.1 – YARN

  • Understand the architecture of YARN
  • Understand the components of the YARN ResourceManager
  • Demonstrate the relationship between NodeManagers and ApplicationMasters
  • Demonstrate the relationship between ResourceManagers and ApplicationMasters
  • Explain The relationship between Containers and ApplicationMasters
  • Explain how Container failure is handled for a YARN MapReduce job

Objective 3.1 – Pig and Hive

  • Differentiate between Pig data types, including the complex types bag, tuple and map
  • Define Pig relations
  • Write a User-Defined Pig Function
  • Invoke a Pig UDF
  • Explain how Hive tables are defined and implemented
  • Manage External vs. Hive-managed tables
  • Write a User-Defined Hive Function
  • Invoke a Hive UDF

Objective 4.1 – Hadoop 2.0

  • Understand the relationship between NameNodes and DataNodes
  • Understand the relationship between NameNodes and namespaces in Hadoop 2.0
  • Explain how HDFS Federation works in Hadoop 2.0
  • Demonstrate understanding of the various components of NameNode HA in Hadoop 2.0

Objective 5.1 – HBase

  • Use the HBase API to add or delete a row to an HBase table
  • Use the HBase API to retrieve data from an HBase table

Objective 6.1 – Workflow

  • Understand Oozie workflow actions
  • Deploy Oozie workflows
  • Use the org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl class to define a workflow

Focus Areas:

  • Hadoop Definitive Guide by Tom White is your Primary Text book (Chapters 2 to 7). Apart from this Refer

Apache documents for the below references.

  • Topics and materials under Objectives 1.1 and 2.1 from Pig/Hive Certification
  • Get a clear understanding of MR Wrapper Classes like IntWritable, LongWritable, Doublewritable etc
  • Mapper Class API and Map Method.
  • Reducer Class API and Reduce Method.
  • Job Class API and run method other ‘set’ methods under Job Class API
  • Get a thorough understanding of WordCount Mapper, Reducer, Job Classes so you will be more familiar on a generic MR program.
  • MR Architecture and Study how key/value pairs are flowing through all phases
  • Importance of Combiner and it’s Parent Class
  • Importance of Partitioner and it’s Parent Class, Default Partitioner (Hashpartitioner and Hascode method), Customer Partitioner and it’s methods
  • Secondary Sorting, Writing Custom Key class , Custom Value Class, Group Comparator Class
  • InputFormat Class, Custom InpurForamt and it’s methods
  • Various FileFormat types and it’s basic methods
  • Parameters used for Map Phase Optimization, Reducer Phase Optimization
  • Enable Compression in MR, Various types of Compression
  • Raw Comparator Class and it’s methods
  • Localization techniques like Distributed Cache of Files, Archives, etc.
  • Various Joins (Map, Reduce, Bucket) in MR and use of CompositeInputFormat
  • Use of BloomFilters in Map Side Join
  • MRUnit (How to test a mapper, Reducer, MapReducerJob using the driver classes)
  • Hbase APIs like Htable, TableMapper, TableReducer
  • Hbase data insertion, retrieval, deletion APIs and it’s methods
  • Pig UDF, Accumulator UDF Classes like EvalFunc, FilterFunc and exec method
  • Hive UDF, UDAF, UDTF Classes and methods like evaluate
  • Oozie JobControl Class, How to set dependencies in JobControl class
  • MR Counters, Custom Counters
  • FSDataInputStream, FSDataOutputStream
  • Refer HW Lab Contents for better understanding
  • Get Programming Level Practice of all the above areas because all the questions are on more practicals.

MR LifeCycle Sample Programs:

http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Mapper

Writable Types:

https://developer.yahoo.com/hadoop/tutorial/module5.html#writable

Oozie JobControl Examples:

JobControl control = new JobControl(“testjob”);

ControlledJob cjob1 = new ControlledJob(job1, null);

List<ControlledJob> dependentJobs =new ArrayList<ControlledJob>();

dependentJobs.add(cjob1);

ControlledJob cjob2 =new ControlledJob(job2, dependentJobs);

control.addJob(cjob1);

control.addJob(cjob2);

new Thread(control).start();

while (!control.allFinished()) {

Thread.sleep(1000);

}

control.stop();

Hbase Put, Get, Delete APIs:

https://autofei.wordpress.com/2012/04/02/java-example-code-using-hbase-data-model-operations/

FileInputFormt and RecordReader APIs:

https://hadoopi.wordpress.com/2013/05/27/understand-recordreader-inputsplit/

Pig Accumulator UDF:

http://stackoverflow.com/questions/14924059/any-good-examples-of-pig-accumulator-interface-implementation-that-works

Thoroughly Study the MR Life Cycle:

Mapper:

public class WordCountMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
protected void map(LongWritable key, Text value,
Context context)
throws IOException, InterruptedException {
String currentLine = value.toString();
String [] words = currentLine.split(” “);
for(String word : words) {
Text outputKey = new Text(word);
context.write(outputKey, new IntWritable(1));
}
}
}

Reducer:

public class WordCountReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key,
Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {
int sum = 0;
for(IntWritable count : values) {
sum += count.get();
}
IntWritable outputValue = new IntWritable(sum);
context.write(key, outputValue);
}
}

JobClass:

public class WordCountJob extends Configured
implements Tool {
@Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), “WordCountJob”);
Configuration conf = job.getConfiguration();
job.setJarByClass(getClass());
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setComibinerClass(WordCountReducer.class);
job.setNumReduceTasks(2);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true)?0:1;

[[[[[[[[[[[F O L L O W -T H E-C-O-L-O-R-S IN THE ABOVE PROGRAM AND ANSWER THE QUESTIONS BELOW]]]]]]]

Hortonworks Hadoop 2.0 Developer Certification

Horotonworks Certification Tips and Guidelines

I successfully completed this certification on Oct 24, 2014 with a passing score of 88%.  I am sharing the experience I gained on this certification. I have given all the required materials what I have gone through for this certification.  Please have some sandbox level hands on experience on these topics before you appear for the examination.

Read all the answers of a question very carefully before you select the correct answer. Because these are little tricky.  Also 90 minutes is more than enough for this exam as you can easily complete in 45 to 60 mins. All the Best

Certification 1 – Hortonworks Certified Apache Hadoop Developer (Pig and Hive)

Exam Format

  • Registration Cost – $200/attempt (Unlimited attempts are allowed)
  • 1The exam consists of approximately 50 multiple-choice questions. The exam is delivered in English.
  • You have to clear 38 questions (75%) to get certified
  • Certification References @ Hortonworks — > http://hortonworks.com/training/hadoop-2-0-developer-certification/

Course Curriculum

Objective 1.1 – HDFS and Hadoop 2.0

  • Explain Hadoop 2.0 and YARN
  • Explain how HDFS Federation works in Hadoop 2.0
  • Explain the various tools and frameworks in the Hadoop 2.0 ecosystem
  • Use the Hadoop client to input data into HDFS
  • Using HDFS commands

Various Study materials I have referred for this objective:

Objective 2.1 – MapReduce and YARN

  • Explain the architecture of MapReduce
  • Run a MapReduce job on Hadoop
  • Monitor a MapReduce job

Various Study materials I have referred for this objective:

Objective 3.1 – Pig

  • Write a Pig script to explore and transform data in HDFS
  • Define advanced Pig relations
  • Use Pig to apply structure to unstructured Big Data
  • Invoke a Pig User-Defined Function
  • Compute Quartiles with Pig
  • Explore data with Pig
  • Split a dataset with Pig
  • Join datasets with Pig
  • Use Pig to prepare data for Hive

Objective 4.1 – Hive and HCatalog

  • Write a Hive query
  • Understand how Hive tables are defined and implemented
  • Use Hive to run SQL-like queries to perform data analysis
  • Perform a multi-table select in Hive
  • Design a proper schema for Hive
  • Explain the uses and purpose of HCatalog ™
  • Use HCatalog with Pig and Hive
  • Computing ngrams with Hive
  • Analyzing Big Data with Hive
  • Understanding MapReduce in Hive
  • Joining datasets with Hive
  • Streaming data with Hive and Python

Various Study materials I have referred for the objectives 3.1 and 4.1:

Objective 5.1 – Hadoop Tools

  • Use Sqoop to transfer data between Hadoop and a relational database·
  • Using Sqoop to transfer data between HDFS and a RDBMS
  • Using HCatalog with Pig
  • Define a workflow using Oozie

Various Study materials I have referred for the above objective.

Additional Oozie References:

Oozie

Oozie_another reference

Focus More on:

  • Functional Flow of YARN — >> Client — > Resource Manager — > Application Master —  >Containers.
  • Hadoop fs get (copyToLocal), put (copyFromLocal), cat commands
  • Pig JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN (Please find the attachment below)
  • Pig Replicated, Merge, Skew Joins
  • REGISTER and INVOKE Pig UDFs
  • Hive Managed vs External Tables and It’s file transmission to /apps/hive/warehouse directory.
  • HCatLoader,  HCatStorer
  • Hive SORT BY
  • MR Default Partitioner
  • WebHDFS open, CREATE, MKDIRS, LISTSTATUS, GETFILESTATUS commands
  • Hive ngrams
  • HDFS Federation vs DataNodes vs Namenode Failures.
  • Get a Clear Programming Level understanding of Pig and Hive Scripts
  • Apache HCatalog

Pig Joins

Some More Notables:

1.Pig Picks the file from current working directory

Picture1

2.Pig Sample Command

Picture2

3.Top 50 BiGrams

Picture3

4. HCatLoader and HCatStorer

Picture4

5.NameNode Federation

Picture5

6. YARN Life Cycle

Picture6

Am sure you must be able to follow my guidelines on this Pig Hive Certification. Wish you all the Best and Advanced Congratulations.