Hive Optimization Techniques in Hadoop 2.1

I did lot of research and finally arrived some optimizations on Hive SQLs.  For large table joins and high volume of data, I prefer to go with Apache Tez Engine than MR as an execution engine.

Enable the below Properties in hive SQL for large volumes of data:

SET hive.execution.engine = tez;
SET hive.vectorized.execution.enabled=true;
SET hive.compute.query.using.stats = true;
SET hive.stats.fetch.column.stats = true;
SET hive.stats.fetch.partition.stats = true;
SET hive.cbo.enable = true;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.parallel=true;
SET hive.exec.reducers.bytes.per.reducer=1000000000; (Depends on your total size of all tables in the hql)
SET hive.mapjoin.smalltable.filesize=1000000000;
SET; (Depends on Map Memory Capacity)
SET hive.hadoop.supports.splittable.combineinputformat=true;
SET hive.mapjoin.optimized.keys=true;
SET hive.mapjoin.lazy.hashtable=true;
SET hive.exec.parallel.thread.number=16;
SET hive.merge.mapfiles=true;
SET hive.merge.mapredfiles=true;
SET hive.optimize.skewjoin=true;
SET hive.optimize.bucketmapjoin=true;
SET hive.mapred.supports.subdirectories=true;
SET mapred.input.dir.recursive=true;
SET mapreduce.job.reduces=-1;
SET hive.exec.compress.intermediate=true;
SET hive.exec.compress.output=true;
SET tez.runtime.intermediate-output.should-compress=true;

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s