I got a use case of designing a Landing layer in Hive. So, i thought of exploring the options in and around hadoop and hive. Finally found that ORC table gives the overall efficiency in hive. So, I had to design the Landing layer in hive as per the below diagram.
Source DB – Oracle
Hadoop Distribution used – Hortonworks HDP 2.1
Steps I designed:
1. Sqoop import to STG* tables
2. Insert data to Hive ORC tables (External table with ORC foramtted, Compressed, Partitioned) from STG tables
3.Drop STG tables after the successful load of EXT tales
4. Sqoop the delta records to Load the into STG_DELTA tables (sqoop – hive direct import)
5.Load the STG_DELTA records into EXT tables.
This way, you can maintain a initial and incremental load for your use cases.