Designing a Initial and Incremental Load in Hive using ORC tables

Hi,

I got a use case of designing a Landing layer in Hive. So, i thought of exploring the options in and around hadoop and hive. Finally found that ORC table gives the overall efficiency in hive. So, I had to design the Landing layer in hive as per the below diagram.

Picture5

Source DB – Oracle

Hadoop Distribution used – Hortonworks HDP 2.1

Steps I designed:

1. Sqoop import to STG* tables
2. Insert data to Hive ORC tables (External table with ORC foramtted, Compressed, Partitioned) from STG tables
3.Drop STG tables after the successful load of EXT tales
4. Sqoop the delta records to Load the into STG_DELTA tables (sqoop – hive direct import)
5.Load the STG_DELTA records into EXT tables.

This way, you can maintain a initial and incremental load for your use cases.

Advertisements

2 thoughts on “Designing a Initial and Incremental Load in Hive using ORC tables

  1. Hi.

    I am new to hive and I am using sqoop to build an incremental load strategy on a hadoop data lake for my university. I have been through tons of website but they tend to talk around in circles. So could you please elaborate on the above process .

    Like

  2. Hi Giri,
    Makes it sound real simple. Probably is. Can you please give a simple demo on how to do it like you’ve done on other topics, which I have followed and gained much. Will really appreciate your help.
    Regards,
    RD

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s