How to Load the Data into ElasticSearch index through Pig

REGISTER /tmp/elasticsearch-hadoop-2.0.0.RC1.jar;

REGISTER /tmp/elasticsearch-hadoop-pig-2.0.0.RC1.jar;

load_pam_raw_data = LOAD '/QA//PAM/incremental/rawpamdata/2014-05-27/part-*' Using PigStorage('|') AS (AGNT_ID:chararray,LST_NM:chararray,MDL_NM:chararray,FRST_NM:chararray,PRF_LST_NM:chararray,PRF_MDL_NM:chararray,PRF_FRST_NM:chararray,AGNT_STATUS:chararray,STTS_EFFCTV_DT:chararray,AGNCY_ID:chararray,LCTN_ID:chararray,CHNL_TYP:chararray,AGNT_TYP_ID:chararray,AGNT_ADRS_LN1:chararray,AGNT_ADRS_LN2:chararray,AGNT_ST:chararray,AGNT_CITY_NM:chararray,AGNT_ZP_CD:chararray,BSNS_PHN:chararray);

STORE load_pam_raw_data INTO 'pig_pam_index/data' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes=node1:9200,node2:9200','es.index.auto.create = true');
if the ES index is already present, you need to give es.index.auto.create=false. Similar way, we can read the ES data through Pig which we will see in my next post.
Advertisements

2 thoughts on “How to Load the Data into ElasticSearch index through Pig

  1. Well, I’ve tried doing this :
    REGISTER /usr/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-pig-2.0.2.jar

    –load the CDR.csv file
    cdr= LOAD ‘/user/admin/CDR_OMAR.csv’ using PigStorage(‘;’)
    AS
    (traffic_type_id:int,caller:int,call_time:datetime,tranche_horaire:int,called:int,called:int,call_duration:int,code_type:chararray,code_destination:chararray,location:chararray,id_offre:int,id_service:int,date_heure_appel:chararray);

    –STORE cdr INTO ‘indexOmar/typeOmar’ USING EsStorage(‘es.nodes’=’0.44.162.169:9200’)
    STORE cdr INTO ‘telecom/cdr’ USING org.elasticsearch.hadoop.pig.EsStorage(‘es.nodes’=’10.44.162.169’,
    ‘es.mapping.names=call_time:@timestamp’,
    ‘es.index.auto.create = false’);

    But, I got this error :
    Run pig script using PigRunner.run() for Pig version 0.8+
    2015-03-06 14:22:21,768 [main] INFO org.apache.pig.Main – Apache Pig version 0.12.0-cdh5.3.1 (rexported) compiled Jan 27 2015, 14:45:17
    2015-03-06 14:22:21,770 [main] INFO org.apache.pig.Main – Logging error messages to: /yarn/nm/usercache/admin/appcache/application_1425457357655_0009/container_1425457357655_0009_01_000002/pig-job_1425457357655_0009.log
    2015-03-06 14:22:21,863 [main] INFO org.apache.pig.impl.util.Utils – Default bootup file /var/lib/hadoop-yarn/.pigbootup not found

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s