How to Resolve Hive Vertex Issues due to Vertex Failure with Null Pointer Exception!!

The use of Hive ROW_NUMBER() function may cause Vertex Failure with NullPointerException in Hive 0.13 version.

For example when u run below kind of queries in Hive.

SELECT F.*, ROW_NUMBER () OVER (PARTITION BY COL1 , COL2, COL3 , COL4 ORDER BY ACTIVE_POL_PICK_DT DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) RNK
FROM dB.TableName F WHERE COL5 > 0 AND COL6-COL8 < 0 AND COL1 IS NOT NULL AND COL2 IS NOT NULL AND COL3 IS NOT NULL AND COL4 IS NOT NULL;
Alternate Approach in Hive:

This approach uses Distribute By and SORY by clauses and resolve the vertex issues in Hive.

ADD JAR /tmp/nexr-hive-udf-0.2-SNAPSHOT.jar; (THis can be downloaded from http://nexr.github.io/hive-udf/)

Create a Custom Function :

CREATE TEMPORARY FUNCTION ROW_NUMBER_GENERIC AS ‘com.nexr.platform.hive.udf.GenericUDFRowNumber’;

SELECT SUB1.* FROM
(SELECT T1.*,ROW_NUMBER_GENERIC(T1.ACTIVE_POL_PICK_DT) as RNK FROM (SELECT F.* FROM db.TableName F WHERE DATEDIFF(COL5,COL6) <30 AND DATEDIFF(COL7,COL8) < 0 AND COL1 IS NOT NULL AND COL2 IS NOT NULL AND COL3 IS NOT NULL AND COL4 IS NOT NULL DISTRIBUTE BY COL1 ,COL2,COL3 ,COL4 SORT BY COL1 ,COL2,COL3 ,COL4,ACTIVE_POL_PICK_DT DESC)T1)SUB1

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s