How to Sqoop Export a Hive ORC table to a Oracle Database?

Hope you must have some idea about Hive ORC tables.  Let’s create a ORC table here and feed some sample data in Hive.  Similarly let’s create a table structure in ORACLE in the meantime.

We are going to use Sqoop-HCatalog Integration here. Just type “sqoop export help” in Bash and see what are all the sqoop parameter commands there for the Sqoop Export related to HCatalog. I got the details as below.

HCatalog arguments:
–hcatalog-database <arg> HCatalog database name
–hcatalog-home <hdir> Override $HCAT_HOME
–hcatalog-table <arg> HCatalog table name
–hive-home <dir> Override $HIVE_HOME
–hive-partition-key <partition-key> Sets the partition key to
use when importing to hive
–hive-partition-value <partition-value> Sets the partition value to
use when importing to hive
–map-column-hive <arg> Override mapping for
specific column to hive
types.

Now see the sample code snippets below for a very simple Sqoop Export Process.

Step1:
Create a Hive Table in TXT format. You can create this in any format like Avro, RCfile,etc
create table customers_txt (customer_id string, customer_name string, city string) row format delimited fields terminated by ‘,’ stored as textfile;
OK
Time taken: 0.611 seconds

Step 2:
Create a Hive table in ORC format
create table customers(customer_id string, customer_name string, city string) row format delimited fields terminated by ‘,’ stored as orc;
OK
Time taken: 0.594 seconds

Step 3:
Create some test data here
bash-4.1$vi customer.txt
1 giri nellai
2 giri town
3 giri pheonix
4 parthi glendale
5 ram glendale
6 goutam campbell
:wq!

Step 4:
Load test data to a Txt formatted table
hive> load data local inpath ‘/tmp/customer.txt’ into table customers_txt;
Copying data from file:/tmp/customer.txt
Copying file: file:/tmp/customer.txt
Loading data to table default.customers_txt
Table default.customers_txt stats: [numFiles=1, numRows=0, totalSize=92, rawDataSize=0]
OK
Time taken: 1.301 seconds

Verify the data
hive> select * from customers_txt;
OK
1 giri nellai
2 giri town
3 giri pheonix
4 parthi glendale
5 ram glendale
6 goutam campbell
Time taken: 0.456 seconds, Fetched: 6 row(s)

Step 5:Insert the data into ORC table:
insert overwrite table customers select * from customers_txt;

Step 6:
Execute the below Sqoop Export Command
sqoop export –connect jdbc:oracle:thin:@Servername:1521/dbName –username *********** –password ********* –table dbName.CUSTOMERS–hcatalog-table customers
Verify the Sqoop Job Output:
15/09/08 17:02:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1438142065989_98389
15/09/08 17:02:27 INFO impl.YarnClientImpl: Submitted application application_1438142065989_98389
15/09/08 17:02:27 INFO mapreduce.Job: The url to track the job: http://n01bdl303.aap.csaa.pri:8088/proxy/application_1438142065989_98389/
15/09/08 17:02:27 INFO mapreduce.Job: Running job: job_1438142065989_98389
15/09/08 17:02:38 INFO mapreduce.Job: Job job_1438142065989_98389 running in uber mode : false
15/09/08 17:02:38 INFO mapreduce.Job: map 0% reduce 0%
15/09/08 17:02:46 INFO mapreduce.Job: map 100% reduce 0%
15/09/08 17:02:46 INFO mapreduce.Job: Job job_1438142065989_98389 completed successfully
15/09/08 17:02:46 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=269257
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=10756
HDFS: Number of bytes written=0
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5338
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=5338
Total vcore-seconds taken by all map tasks=5338
Total megabyte-seconds taken by all map tasks=24298576
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=10112
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=41
CPU time spent (ms)=1740
Physical memory (bytes) snapshot=337338368
Virtual memory (bytes) snapshot=8627503104
Total committed heap usage (bytes)=2070413312
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/09/08 17:02:46 INFO mapreduce.ExportJobBase: Transferred 10.5039 KB in 30.9629 seconds (347.3836 bytes/sec)
15/09/08 17:02:46 INFO mapreduce.ExportJobBase: Exported 6 records.

Oracle Output:

9-8-2015 5-15-08 PM

Debugging:

I have mentioned some of the errors you might face and the troubleshooting steps.

Root cause 1: Table Name is not similar to Oracle (Ex. Oracle table is CUSTOMERS (uppercase), sqoop export has customers in lower case.

Note — > Use the As Is Case for the Table Name in Oracle otherwise Sqoop Export will throw an error as below. Here the table name is in UpperCase as similar to Oracle)

sqoop export –connect jdbc:oracle:thin:@Servername:1521/dbName –username ******** –password ******** –table dbName.customers –hcatalog-table customers

15/09/08 16:49:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM customers t WHERE 1=0
15/09/08 16:49:23 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: Attempted to generate class with no columns!
java.lang.IllegalArgumentException: Attempted to generate class with no columns!
at org.apache.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:1316)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1197)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)

Root cause 2: Table Does not exist in oracle. So, you have to have a Structure in Oracle before the Sqoop Export.

bash-4.1$ sqoop export –connect jdbc:oracle:thin:@Servername:1521/dbName –username ******** –password ******** –table dbName.CUSTOMERS1–hcatalog-table custom
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/09/08 17:20:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.1.5.0-695
15/09/08 17:20:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/09/08 17:20:46 INFO manager.SqlManager: Using default fetchSize of 1000
15/09/08 17:20:46 INFO tool.CodeGenTool: Beginning code generation
15/09/08 17:20:47 INFO manager.OracleManager: Time zone has been set to GMT
15/09/08 17:20:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM EADS_SIS_HADOOP.CUSTOMERS1 t WHERE 1=0
15/09/08 17:20:47 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: ORA-00942: table or view does not exist

Advertisements

How to import BLOB CLOB columns to Hive via Sqoop

sqoop import \
-Dmapred.job.queue.name=default \
–connect jdbc:oracle:thin:@hostname:port/port \
–username Xxxxxx \
–password XXXXXX \
–query “SELECT * FROM tablename WHERE \$CONDITIONS” \
–hive-drop-import-delims \
–map-column-java column1=String,column2=String \
-m 8 \
–hive-import \
–hive-table tablename \
–target-dir /user/hdfs/ \
–fields-terminated-by ’01’ \
–split-by id;

column1/2 — > Any CLOB/BLOB column converted to Java STRING value.

How to Connect to AS400 DB2 via Sqoop and Import the data into HDFS or Hive

Hi,

Today i got a requirement of importing the AS400 DB2 data into HDFS or in Hive tables.

Please download the jt400.jar and place it in /usr/lib/sqoop/lib directory in the Hadoop Box.

Then I used the below SQoop script to import into Hive.

sqoop import \
–driver com.ibm.as400.access.AS400JDBCDriver \
–connect jdbc:as400://as400servername/dbname \
–username xxxxxxxx –password xxxxxxxx \
–query ‘SELECT * FROM <table_name> where $CONDITIONS’ \
–hive-import \
–hive-table table_name \
–target-dir /user/hdfs/example \
–fields-terminated-by ’01’ \
–split-by <id> \
-m 4;