HDFS access from remote host through Java API, user authentication HDFS access from remote host through Java API, user authentication hadoop hadoop

HDFS access from remote host through Java API, user authentication


After some studying I came to the following solution:

  • I don't actually need the full Kerberos solution, it is enough currently that clients can run HDFS requests from any user. Environment itself is considered secure.
  • This gives me solution based on hadoop UserGroupInformation class. In future I can extend it to support Kerberos.

Sample code probably useful for people both for 'fake authentication' and remote HDFS access:

package org.myorg;import java.security.PrivilegedExceptionAction;import org.apache.hadoop.conf.*;import org.apache.hadoop.security.UserGroupInformation;import org.apache.hadoop.fs.Path;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.FileStatus;public class HdfsTest {    public static void main(String args[]) {        try {            UserGroupInformation ugi                = UserGroupInformation.createRemoteUser("hbase");            ugi.doAs(new PrivilegedExceptionAction<Void>() {                public Void run() throws Exception {                    Configuration conf = new Configuration();                    conf.set("fs.defaultFS", "hdfs://1.2.3.4:8020/user/hbase");                    conf.set("hadoop.job.ugi", "hbase");                    FileSystem fs = FileSystem.get(conf);                    fs.createNewFile(new Path("/user/hbase/test"));                    FileStatus[] status = fs.listStatus(new Path("/user/hbase"));                    for(int i=0;i<status.length;i++){                        System.out.println(status[i].getPath());                    }                    return null;                }            });        } catch (Exception e) {            e.printStackTrace();        }    }}

Useful reference for those who have a similar problem:

  • Cloudera blog post "Authorization and Authentication In Hadoop". Short, focused on simple explanation of hadoop security approaches. No information specific to Java API solution but good for basic understanding of the problem.

UPDATE:
Alternative for those who uses command line hdfs or hadoop utility without local user needed:

 HADOOP_USER_NAME=hdfs hdfs fs -put /root/MyHadoop/file1.txt /

What you actually do is you read local file in accordance to your local permissions but when placing file on HDFS you are authenticated like user hdfs.

This has pretty similar properties to API code illustrated:

  1. You don't need sudo.
  2. You don't need actually appropriate local user 'hdfs'.
  3. You don't need to copy anything or change permissions because of previous points.