Why hdfs throwing LeaseExpiredException in Hadoop cluster (AWS EMR)

apache amazon-web-services hadoop hive apache-tez

I resolved the issue. Let me explain in detail.

Exceptions that is coming -

LeaveExpirtedException - from HDFS side.
FileNotFoundException - from Hive side (when Tez execution engine executes DAG)

Problem scenario-

We just upgraded the hive version from 0.13.0 to 2.1.0. And, everything was working fine with previous version. Zero runtime exception.

Different thoughts to resolve the issue -

First thought was, two threads was working on same piece because of NN intelligence. But as per below settings
set mapreduce.map.speculative=falseset mapreduce.reduce.speculative=false

that was not possible.

then, I increase the count from 1000 to 100000 for below settings -
SET hive.exec.max.dynamic.partitions=100000; SET hive.exec.max.dynamic.partitions.pernode=100000;

that also didn't work.

Then the third thought was, definitely in a same process, what mapper-1 was created was deleted by another mapper/reducer. But, we didn't found any such logs in Hveserver2, Tez logs.
Finally the root cause lies in a application layer code itself. In hive-exec-2.1.0 version, they introduced new configuration property
"hive.exec.stagingdir":".hive-staging"

Description of above property -

Directory name that will be created inside table locations in order to support HDFS encryption. This is replaces ${hive.exec.scratchdir} for query results with the exception of read-only tables. In all cases ${hive.exec.scratchdir} is still used for other temporary files, such as job plans.

So if there is any concurrent jobs in Application layer code (ETL), and are doing operation(rename/delete/move) on same table, then it may lead to this problem.

And, in our case, 2 concurrent jobs are doing "INSERT OVERWRITE" on same table, that leads to delete metadata file of 1 mapper, that is causing this issue.

Resolution -

Move the metadata file location to outside table(table lies in S3).
Disable HDFS encryption (as mentioned in Description of stagingdir property.)
Change into your Application layer code to avoid concurrency issue.

Related question - Why hive_staging file is missing in AWS EMR

CodeHunter

Why hdfs throwing LeaseExpiredException in Hadoop cluster (AWS EMR)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last