How to Sqoop Export a Hive ORC table to a Oracle Database?

Hope you must have some idea about Hive ORC tables.  Let’s create a ORC table here and feed some sample data in Hive.  Similarly let’s create a table structure in ORACLE in the meantime.

We are going to use Sqoop-HCatalog Integration here. Just type “sqoop export help” in Bash and see what are all the sqoop parameter commands there for the Sqoop Export related to HCatalog. I got the details as below.

HCatalog arguments:
–hcatalog-database <arg> HCatalog database name
–hcatalog-home <hdir> Override $HCAT_HOME
–hcatalog-table <arg> HCatalog table name
–hive-home <dir> Override $HIVE_HOME
–hive-partition-key <partition-key> Sets the partition key to
use when importing to hive
–hive-partition-value <partition-value> Sets the partition value to
use when importing to hive
–map-column-hive <arg> Override mapping for
specific column to hive
types.

Now see the sample code snippets below for a very simple Sqoop Export Process.

Step1:
Create a Hive Table in TXT format. You can create this in any format like Avro, RCfile,etc
create table customers_txt (customer_id string, customer_name string, city string) row format delimited fields terminated by ‘,’ stored as textfile;
OK
Time taken: 0.611 seconds

Step 2:
Create a Hive table in ORC format
create table customers(customer_id string, customer_name string, city string) row format delimited fields terminated by ‘,’ stored as orc;
OK
Time taken: 0.594 seconds

Step 3:
Create some test data here
bash-4.1$vi customer.txt
1 giri nellai
2 giri town
3 giri pheonix
4 parthi glendale
5 ram glendale
6 goutam campbell
:wq!

Step 4:
Load test data to a Txt formatted table
hive> load data local inpath ‘/tmp/customer.txt’ into table customers_txt;
Copying data from file:/tmp/customer.txt
Copying file: file:/tmp/customer.txt
Loading data to table default.customers_txt
Table default.customers_txt stats: [numFiles=1, numRows=0, totalSize=92, rawDataSize=0]
OK
Time taken: 1.301 seconds

Verify the data
hive> select * from customers_txt;
OK
1 giri nellai
2 giri town
3 giri pheonix
4 parthi glendale
5 ram glendale
6 goutam campbell
Time taken: 0.456 seconds, Fetched: 6 row(s)

Step 5:Insert the data into ORC table:
insert overwrite table customers select * from customers_txt;

Step 6:
Execute the below Sqoop Export Command
sqoop export –connect jdbc:oracle:thin:@Servername:1521/dbName –username *********** –password ********* –table dbName.CUSTOMERS–hcatalog-table customers
Verify the Sqoop Job Output:
15/09/08 17:02:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1438142065989_98389
15/09/08 17:02:27 INFO impl.YarnClientImpl: Submitted application application_1438142065989_98389
15/09/08 17:02:27 INFO mapreduce.Job: The url to track the job: http://n01bdl303.aap.csaa.pri:8088/proxy/application_1438142065989_98389/
15/09/08 17:02:27 INFO mapreduce.Job: Running job: job_1438142065989_98389
15/09/08 17:02:38 INFO mapreduce.Job: Job job_1438142065989_98389 running in uber mode : false
15/09/08 17:02:38 INFO mapreduce.Job: map 0% reduce 0%
15/09/08 17:02:46 INFO mapreduce.Job: map 100% reduce 0%
15/09/08 17:02:46 INFO mapreduce.Job: Job job_1438142065989_98389 completed successfully
15/09/08 17:02:46 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=269257
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=10756
HDFS: Number of bytes written=0
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5338
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=5338
Total vcore-seconds taken by all map tasks=5338
Total megabyte-seconds taken by all map tasks=24298576
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=10112
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=41
CPU time spent (ms)=1740
Physical memory (bytes) snapshot=337338368
Virtual memory (bytes) snapshot=8627503104
Total committed heap usage (bytes)=2070413312
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/09/08 17:02:46 INFO mapreduce.ExportJobBase: Transferred 10.5039 KB in 30.9629 seconds (347.3836 bytes/sec)
15/09/08 17:02:46 INFO mapreduce.ExportJobBase: Exported 6 records.

Oracle Output:

9-8-2015 5-15-08 PM

Debugging:

I have mentioned some of the errors you might face and the troubleshooting steps.

Root cause 1: Table Name is not similar to Oracle (Ex. Oracle table is CUSTOMERS (uppercase), sqoop export has customers in lower case.

Note — > Use the As Is Case for the Table Name in Oracle otherwise Sqoop Export will throw an error as below. Here the table name is in UpperCase as similar to Oracle)

sqoop export –connect jdbc:oracle:thin:@Servername:1521/dbName –username ******** –password ******** –table dbName.customers –hcatalog-table customers

15/09/08 16:49:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM customers t WHERE 1=0
15/09/08 16:49:23 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: Attempted to generate class with no columns!
java.lang.IllegalArgumentException: Attempted to generate class with no columns!
at org.apache.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:1316)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1197)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)

Root cause 2: Table Does not exist in oracle. So, you have to have a Structure in Oracle before the Sqoop Export.

bash-4.1$ sqoop export –connect jdbc:oracle:thin:@Servername:1521/dbName –username ******** –password ******** –table dbName.CUSTOMERS1–hcatalog-table custom
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/09/08 17:20:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.1.5.0-695
15/09/08 17:20:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/09/08 17:20:46 INFO manager.SqlManager: Using default fetchSize of 1000
15/09/08 17:20:46 INFO tool.CodeGenTool: Beginning code generation
15/09/08 17:20:47 INFO manager.OracleManager: Time zone has been set to GMT
15/09/08 17:20:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM EADS_SIS_HADOOP.CUSTOMERS1 t WHERE 1=0
15/09/08 17:20:47 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: ORA-00942: table or view does not exist

Advertisements

3 thoughts on “How to Sqoop Export a Hive ORC table to a Oracle Database?

  1. How about the scenario in sqoop export from hdfs to oracle where the table in oracle has to be mixed case?
    I created a schema and table in oracle 11g using sqoop export commands :
    Schema Name – TestV100
    Table Name – Xy100

    I am now trying to load the table using a tab delimited file:
    /usr/bin/sudo -u hdfs sqoop export –connect jdbc:oracle:thin:@oracle11:1521:orcl -username test -password password “/hdfs_nfs_mount/tmp/oracle/TestV100/Xy100.txt” –table “\”\”TestV100\”.\”Xy100\”\”” –fields-terminated-by “\t” –input-null-string null -m 1

    I get this error:
    15/11/03 13:38:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM “TestV100”.”Xy100″ t WHERE 1=0
    15/11/03 13:38:57 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: There is no column found in the target table “TestV100″.”Xy100”. Please ensure that your table name is correct.
    java.lang.IllegalArgumentException: There is no column found in the target table “TestV100″.”Xy100”. Please ensure that your table name is correct.

    This is the input file:
    a 0 0
    a 1 4
    a 2 2

    Oracle Version: 11.1.0.6.0
    Sqoop Version: 1.4.5-cdh5.4.1
    Ojdbc version: ojdbc6.jar

    If I connect to the database directly (using sql developer) as ‘test’ user and run this command, it works fine (it shows zero rows basicaly)
    SELECT t.* FROM “TestV100”.”Xy100″ t WHERE 1=0;

    Like

  2. I get error when trying this from an AWS EMR master node. Any idea what to do? When I searched online, many people said I have to add something to tnsnames.ora, but that file only comes with installing sqlplus client. Do I have to really install anything on master node?

    ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Listener refused the connection with the following error:
    ORA-12514, TNS:listener does not currently know of service requested in connect descriptor

    java.sql.SQLException: Listener refused the connection with the following error:
    ORA-12514, TNS:listener does not currently know of service requested in connect descriptor

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s