Reading remote HDFS file with Java

java linux apache hadoop hdfs

Hadoop error messages are frustrating. Often they don't say what they mean and have nothing to do with the real issue. I've seen problems like this occur when the client, namenode, and datanode cannot communicate properly. In your case I would pick one of two issues:

Your cluster runs in a VM and its virtualized network access to the client is blocked.
You are not consistently using fully-qualified domain names (FQDN) that resolve identically between the client and host.

The host name "test.server" is very suspicious. Check all of the following:

Is test.server a FQDN?
Is this the name that has been used EVERYWHERE in your conf files?
Can the client and all hosts forward and reverse resolve"test.server" and its IP address and get the same thing?
Are IP addresses being used instead of FQDN anywhere?
Is "localhost" being used anywhere?

Any inconsistency in the use of FQDN, hostname, numeric IP, and localhost must be removed. Do not ever mix them in your conf files or in your client code. Consistent use of FQDN is preferred. Consistent use of numeric IP usually also works. Use of unqualified hostname, localhost, or 127.0.0.1 cause problems.

java linux apache hadoop hdfs

We need to make sure to have configuration with fs.default.name space set such as

configuration.set("fs.default.name","hdfs://ourHDFSNameNode:50000");

Below I've put a piece of sample code:

 Configuration configuration = new Configuration(); configuration.set("fs.default.name","hdfs://ourHDFSNameNode:50000"); FileSystem fs = pt.getFileSystem(configuration); BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(pt))); String line = null; line = br.readLine while (line != null) {  try {    line = br.readLine    System.out.println(line);  }}

java linux apache hadoop hdfs

The answer above is pointing to the right direction. Allow me to add the following:

Namenode does NOT directly read or write data.
Client (your Java program using Direct access to HDFS) interacts with Namenode to update HDFS namespace and retrieve block locations for reading/writing.
Client interacts directly with Datanode to read/write data.

You were able to list directory contents because hostname:9000was accessible to your client code. You were doing the number 2 above.
To be able to read and write, your client code needs access to the Datanode (number 3). The default port for Datanode DFS data transfer is 50010. Something was blocking your client communication to hostname:50010. Possibly a firewall or SSH tunneling configuration problem.
I was using Hadoop 2.7.2, so maybe you have a different port number setting.

CodeHunter

Reading remote HDFS file with Java

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last