Error in hadoop jobs due to hive query error

Is this number too big to be converted to int?

Yes, this number is too big to convert to an integral type. According to the Apache Hive documentation on Numeric Types, the maximum value for a BIGINT is 9223372036854775807. Your input, 17664956244983174066, is larger than that.

The following is a vanilla Hive query (no DynamoDB integration) demonstrating the effects of attempting to convert various inputs to a BIGINT.

SELECT    "9223372036854775807" AS str,    cast("9223372036854775807" AS BIGINT) AS numbigint,    cast("9223372036854775807" AS DOUBLE) AS numdoubleUNION ALLSELECT    "9223372036854775808" AS str,    cast("9223372036854775808" AS BIGINT) AS numbigint,    cast("9223372036854775808" AS DOUBLE) AS numdoubleUNION ALLSELECT    "17664956244983174066" AS str,    cast("17664956244983174066" AS BIGINT) AS numbigint,    cast("17664956244983174066" AS DOUBLE) AS numdouble;    str numbigint   numdouble0   9223372036854775807 9223372036854775807 9.2233720368547758e+181   9223372036854775808 NULL    9.2233720368547758e+182   17664956244983174066    NULL    1.7664956244983173e+19

At the documented maximum value of BIGINT, the value converts correctly. At just 1 higher, the conversion fails, resulting in NULL. The same thing happens for your input.

The query also demonstrates that conversion to DOUBLE is successful. Perhaps that's a solution, depending on your use case. Compared to an integral data type, this would open a risk of encountering floating point precision issues.

From your stack trace, it appears that the DynamoDB integration results in a NumberFormatException for this case rather than NULL. This is arguably a bug in the DynamoDB connector, but even if it were changed to map to NULL, you still wouldn't have a successful conversion.

java hadoop hive amazon-emr

The solution from AWS experts is to

git clone open source emr-dynamodb-connector
modify the code
prepare your own Jar
Using bootstrapper upload it to EMR
In run_job_flow , send configurations for hadoop env appending your own location of jars in HADOOP_CLASSPATH.

Being not so much into Java, modifying emr-dynamodb-connector was not possible for me, but this is the solution. Also two things can be done... if you don't use Strings in Dynamodb, map string of hive to number of Dynamodb, else add mapping and support for decimal from hive to Dynamodb number

java hadoop hive amazon-emr

Your numbers are out of range for bigint.
Define everything as a string in both sides.

CodeHunter

Error in hadoop jobs due to hive query error

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last