How to Ingest the data into ElasticSearch through Hive

Recently I got a use case to load the data into Elastic Search.  So, when I looked at the options I found we can achieve the goal using Pig/Hive ElasticSearch Connectors easily. I have used Tez in Hive for this option.

SET hive.execution.engine = tez;
SET mapreduce.framework.name=yarn-tez;
SET tez.queue.name=agent;
SET hive.vectorized.execution.enabled=true;
SET hive.auto.convert.join=true;
SET hive.compute.query.using.stats = [true, **false**];
SET hive.stats.fetch.column.stats = [true, **false**];
SET hive.stats.fetch.partition.stats = [true, **false**];
SET hive.cbo.enable = [true, **false**];

Input Driverdata:

1000-54-7908032332029JT3HN87R6X90246441|1000-54-7908|032332029|JT3HN87R6X9024644|1|GIRI V|COLL|FIRST ST,AZ

1000-54-7907032332028JT3HN87R6X90246431|1000-54-7907|032332028|JT3HN87R6X9024643|1|WESLEY D|COMP|SECOND ST,AZ
1000-54-7906032332027JT3HN87R6X90246421|1000-54-7906|032332027|JT3HN87R6X9024642|1|JASWIND S|COLL|THIRD ST,AZ
1000-54-7905032332026JT3HN87R6X90246411|1000-54-7905|032332026|JT3HN87R6X9024641|1|KAMAL S|COMP|FOURTH ST,AZ
1000-54-7904032332025JT3HN87R6X90246401|1000-54-7904|032332025|JT3HN87R6X9024640|1|ARUN N|COLL|FIFTH ST,AZ
1000-54-7902032332024JT3HN87R6X90246451|1000-54-7902|032332024|JT3HN87R6X9024645|1|MATHAN R|COMP|SIXTH ST,AZ
1000-54-7908032332029JT3HN87R6X90246442|1000-54-7908|032332029|JT3HN87R6X9024644|2|GIRI V|BI|FIRST ST,AZ
1000-54-7907032332028JT3HN87R6X90246432|1000-54-7907|032332028|JT3HN87R6X9024643|2|WESLEY D|BI|SECOND ST,AZ
1000-54-7906032332027JT3HN87R6X90246422|1000-54-7906|032332027|JT3HN87R6X9024642|2|JASWIND S|BI|THIRD ST,AZ
1000-54-7905032332026JT3HN87R6X90246412|1000-54-7905|032332026|JT3HN87R6X9024641|2|ARUN N|BI|FOURTH ST,AZ
1000-54-7904032332025JT3HN87R6X90246402|1000-54-7904|032332025|JT3HN87R6X9024640|2|KAMAL S|BI|FIFTH ST,AZ
1000-54-7902032332024JT3HN87R6X90246452|1000-54-7902|032332024|JT3HN87R6X9024645|2|MATHAN R|BI|SIXTH ST,AZ

Create a Hive table to hold the above Driver data:

CREATE TABLE HIVETEST.DRIVERDATA_TMP

(
KEY STRING,
CLAIMNUMBER STRING,
POL_NBR STRING,
VEHICLE_NBR STRING,
EXP_NBR STRING,
NAME STRING,
COVERAGE STRING,
ADDRESS STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH ‘/opt/md/WIP/giri/driverdata.txt’ OVERWRITE INTO TABLE HIVETEST.DRIVERDATA_TMP;

Create an Index in elasticsearch:

curl -XPUT http://localhost:9200/driverdata/?pretty -d'{“rels”:{“properties”:{“key”:{“type”:”string”},”claimnumber”:{“type”:”string”},”pol_nbr”:{“type”:”string”},”vehicle_nbr”:{“type”:”string”},”exp_nbr”:{“type”:”string”},”name”:{“type”:”string”},
“coverage”:{“type”:”string”},”address”:{“type”:”string”}}}}’

This is the External jar you should add while creating and loading the Hive data to ElasticSearch. You can download the latest jar from http://www.elasticsearch.org

ADD JAR /opt/md/giri/ES-Hadoop/elasticsearch-hadoop-2.0.0.RC1.jar;

Create an External table to hold the Hive data for ElasticSearch Index

CREATE EXTERNAL TABLE HIVETEST.DRIVERDATA_ES_JAN2
(
KEY STRING,
CLAIMNUMBER STRING,
POL_NBR STRING,
VEHICLE_NBR STRING,
EXP_NBR STRING,
NAME STRING,
COVERAGE STRING,
ADDRESS STRING
)
STORED BY ‘org.elasticsearch.hadoop.hive.EsStorageHandler’
TBLPROPERTIES (
‘es.resource’ = ‘driverdata/data’,
‘es.nodes’=’node1:9200,node2:9200,etc’,
‘es.index.auto.create’ = ‘false’,
‘es.mapping.id’=’key’);

es.index.auto.create=false — Since the Index is already created in ElasticSearch. Otherwise you can directly feed the data to ES index by giving this property to true.

Ingest the Hive data to ElasticSearch:

INSERT OVERWRITE TABLE HIVETEST.DRIVERDATA_ES_JAN2
SELECT
KEY,
CLAIMNUMBER,
POL_NBR,
VEHICLE_NBR,
EXP_NBR,
NAME,
COVERAGE,
ADDRESS
FROM
HIVETEST.DRIVERDATA_TMP;

You can do a select * from HIVETEST.DRIVERDATA_ES_JAN2 to view the ES data in Hive. Or Navigate to the head plugin of ES index and view the Index data there. I have not put the snapshot. I will put it later.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s