Hortonworks Hadoop 2.0 Java Developer Certification

Hortonworks Certification Tips and guidelines

Certification 2 – Hortonworks Certified Apache Hadoop Developer (Java)

I successfully completed this certification on Nov 24, 2014 with a passing score of 90%.  I am sharing the experience I gained on this certification. I have given all the required materials what I have gone through for this certification.  Please have some sandbox level hands on experience on these topics before you appear for the examination.

Exam Format

Course Curriculum

Objective 1.1 – HDFS and MapReduce

  • Understand how the NameNode maintains the filesystem metadata
  • Understand how data is stored in HDFS
  • Understand the WebHDFS commands
  • Understand the hadoop fs command
  • Understand the relationship between NameNodes and DataNodes
  • Understand the relationship between NameNodes and namespaces in Hadoop 2.0
  • Understand how HDFS Federation works in Hadoop 2.0
  • Understand the various components of NameNode HA in Hadoop 2.0
  • Understand the architecture of MapReduce
  • Understand the various phases of a MapReduce job
  • Demonstrate how key/value pairs flow through a MapReduce job
  • Write Java Mapper and Reducer classes
  • Use the org.apache.hadoop.mapreduce.Job class to configure a MapReduce job
  • Use the TextInputFormat and TextOutputFormat classes
  • Write a custom InputFormat
  • Configure a Combiner
  • Define a custom Combiner
  • Define a custom Partitioner
  • Use the Distributed Cache
  • Use the CompositeInputFormat class

Objective 2.1 – YARN

  • Understand the architecture of YARN
  • Understand the components of the YARN ResourceManager
  • Demonstrate the relationship between NodeManagers and ApplicationMasters
  • Demonstrate the relationship between ResourceManagers and ApplicationMasters
  • Explain The relationship between Containers and ApplicationMasters
  • Explain how Container failure is handled for a YARN MapReduce job

Objective 3.1 – Pig and Hive

  • Differentiate between Pig data types, including the complex types bag, tuple and map
  • Define Pig relations
  • Write a User-Defined Pig Function
  • Invoke a Pig UDF
  • Explain how Hive tables are defined and implemented
  • Manage External vs. Hive-managed tables
  • Write a User-Defined Hive Function
  • Invoke a Hive UDF

Objective 4.1 – Hadoop 2.0

  • Understand the relationship between NameNodes and DataNodes
  • Understand the relationship between NameNodes and namespaces in Hadoop 2.0
  • Explain how HDFS Federation works in Hadoop 2.0
  • Demonstrate understanding of the various components of NameNode HA in Hadoop 2.0

Objective 5.1 – HBase

  • Use the HBase API to add or delete a row to an HBase table
  • Use the HBase API to retrieve data from an HBase table

Objective 6.1 – Workflow

  • Understand Oozie workflow actions
  • Deploy Oozie workflows
  • Use the org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl class to define a workflow

Focus Areas:

  • Hadoop Definitive Guide by Tom White is your Primary Text book (Chapters 2 to 7). Apart from this Refer

Apache documents for the below references.

  • Topics and materials under Objectives 1.1 and 2.1 from Pig/Hive Certification
  • Get a clear understanding of MR Wrapper Classes like IntWritable, LongWritable, Doublewritable etc
  • Mapper Class API and Map Method.
  • Reducer Class API and Reduce Method.
  • Job Class API and run method other ‘set’ methods under Job Class API
  • Get a thorough understanding of WordCount Mapper, Reducer, Job Classes so you will be more familiar on a generic MR program.
  • MR Architecture and Study how key/value pairs are flowing through all phases
  • Importance of Combiner and it’s Parent Class
  • Importance of Partitioner and it’s Parent Class, Default Partitioner (Hashpartitioner and Hascode method), Customer Partitioner and it’s methods
  • Secondary Sorting, Writing Custom Key class , Custom Value Class, Group Comparator Class
  • InputFormat Class, Custom InpurForamt and it’s methods
  • Various FileFormat types and it’s basic methods
  • Parameters used for Map Phase Optimization, Reducer Phase Optimization
  • Enable Compression in MR, Various types of Compression
  • Raw Comparator Class and it’s methods
  • Localization techniques like Distributed Cache of Files, Archives, etc.
  • Various Joins (Map, Reduce, Bucket) in MR and use of CompositeInputFormat
  • Use of BloomFilters in Map Side Join
  • MRUnit (How to test a mapper, Reducer, MapReducerJob using the driver classes)
  • Hbase APIs like Htable, TableMapper, TableReducer
  • Hbase data insertion, retrieval, deletion APIs and it’s methods
  • Pig UDF, Accumulator UDF Classes like EvalFunc, FilterFunc and exec method
  • Hive UDF, UDAF, UDTF Classes and methods like evaluate
  • Oozie JobControl Class, How to set dependencies in JobControl class
  • MR Counters, Custom Counters
  • FSDataInputStream, FSDataOutputStream
  • Refer HW Lab Contents for better understanding
  • Get Programming Level Practice of all the above areas because all the questions are on more practicals.

MR LifeCycle Sample Programs:

http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Mapper

Writable Types:

https://developer.yahoo.com/hadoop/tutorial/module5.html#writable

Oozie JobControl Examples:

JobControl control = new JobControl(“testjob”);

ControlledJob cjob1 = new ControlledJob(job1, null);

List<ControlledJob> dependentJobs =new ArrayList<ControlledJob>();

dependentJobs.add(cjob1);

ControlledJob cjob2 =new ControlledJob(job2, dependentJobs);

control.addJob(cjob1);

control.addJob(cjob2);

new Thread(control).start();

while (!control.allFinished()) {

Thread.sleep(1000);

}

control.stop();

Hbase Put, Get, Delete APIs:

https://autofei.wordpress.com/2012/04/02/java-example-code-using-hbase-data-model-operations/

FileInputFormt and RecordReader APIs:

https://hadoopi.wordpress.com/2013/05/27/understand-recordreader-inputsplit/

Pig Accumulator UDF:

http://stackoverflow.com/questions/14924059/any-good-examples-of-pig-accumulator-interface-implementation-that-works

Thoroughly Study the MR Life Cycle:

Mapper:

public class WordCountMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
protected void map(LongWritable key, Text value,
Context context)
throws IOException, InterruptedException {
String currentLine = value.toString();
String [] words = currentLine.split(” “);
for(String word : words) {
Text outputKey = new Text(word);
context.write(outputKey, new IntWritable(1));
}
}
}

Reducer:

public class WordCountReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key,
Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {
int sum = 0;
for(IntWritable count : values) {
sum += count.get();
}
IntWritable outputValue = new IntWritable(sum);
context.write(key, outputValue);
}
}

JobClass:

public class WordCountJob extends Configured
implements Tool {
@Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), “WordCountJob”);
Configuration conf = job.getConfiguration();
job.setJarByClass(getClass());
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setComibinerClass(WordCountReducer.class);
job.setNumReduceTasks(2);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true)?0:1;

[[[[[[[[[[[F O L L O W -T H E-C-O-L-O-R-S IN THE ABOVE PROGRAM AND ANSWER THE QUESTIONS BELOW]]]]]]]

Advertisements

9 thoughts on “Hortonworks Hadoop 2.0 Java Developer Certification

  1. I am looking for Practice test for “HORTONWORKS HADOOP 2.0 JAVA DEVELOPER CERTIFICATION”
    Any idea where can I find it?

    Like

      • I registered for the exam however I didn’t get any practice test. Is there anything else I need to do to get the tests?

        Like

      • Not sure about it. They usually have sample exam in the registered exams list or under that. You will also need to register for sample exam but you can launch it immediately. Even if you Dont get it won’t worry. Whatever I have mentioned in my blog mostly it’s going to work out.

        Like

  2. i was looking at hortonworks website they have mentioned the practical aspects ..but here it is’nt mentioned. I dont have system configurations (sufficient ram and processor) how shall I touch the depths of learning hadoop?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s