Running any Hadoop command fails after enabling security. Running any Hadoop command fails after enabling security. hadoop hadoop

Running any Hadoop command fails after enabling security.


I ran into a problem in which I had a Kerberized CDH cluster and even with a valid Kerberos ticket, I couldn't run any hadoop commands from the command line.

NOTE: After writing this answer I wrote it up as a blog post at http://sarastreeter.com/2016/09/26/resolving-hadoop-problems-on-kerberized-cdh-5-x/ . Please share!

So even with a valid ticket, this would fail:

$ hadoop fs -ls /WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

Here is what I learned and how I ended up resolving the problem. I have linked to Cloudera doc for the current version where possible, but some of the doc seems to be present only for older versions.

Please note that the problem comes down to a configuration issue but that Kerberos itself and Cloudera Manager were both installed correctly. Many of the problems I ran across while searching for answers came down to Kerberos or Hadoop being installed incorrectly. The problem I had occurred even though both Hadoop and Kerberos were functional, but they were not configured to work together properly.

TL;DR

MAKE SURE YOU HAVE A TICKET

Do a klist from the user you are trying to execute the hadoop command.

$ sudo su - myuser$ klist

If you don't have a ticket, it will print:

klist: Credentials cache file '/tmp/krb5cc_0' not found

If you try to do a hadoop command without a ticket you will get the GSS INITIATE FAILED error by design:

WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

In other words, that is not an install problem. If this is your situation, take a look at:


CDH DEFAULT HDFS USER AND GROUP RESTRICTIONS

A default install of Cloudera has user and group restrictions on execution of hadoop commands, including a specific ban on certain users ( more on page 57 of http://www.cloudera.com/documentation/enterprise/5-6-x/PDF/cloudera-security.pdf ).

There are several properties that deal with this, including the supergroup for hdfs being set to the string supergroup instead of hdfs, dfs_permissions enabled property being set to false by default (hadoop user file permissions), users with uid over 1000 being banned.

Any of these could be a factor, for me it was HDFS being listed in the banned.users property.

Specifically for user HDFS, make sure you have removed hdfs from the banned.users configuration property in hdfs-site.xml configuration if you are trying to use it to execute hadoop commands.

  1) UNPRIVILEGED USER AND WRITE PERMISSIONS

The Cloudera-recommended way to execute Hadoop commands is to create an unprivileged user and matching principal, instead of using the hdfs user. A gotcha is that this user also needs its own /user directory and can run into write permissions errors with the /user directory. If your unprivileged user does not have a directory in /user, it may result in the WRITE permissions denied error.

Cloudera Knowledge Article

http://community.cloudera.com/t5/CDH-Manual-Installation/How-to-resolve-quot-Permission-denied-quot-errors-in-CDH/ta-p/36141

  2) DATANODE PORTS AND DATA DIR PERMISSIONS

Another related issue is that Cloudera sets dfs.datanode.data.dir to 750 on a non-kerberized cluster, but requires 700 on a kerberized cluster. With the wrong dir permissions set, the Kerberos install will fail. The ports for the datanodes must also be set to values below 1024, which are recommended as 1006 for the HTTP port and 1004 for the Datanode port.

Datanode Directory

http://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_ig_hdfs_cluster_deploy.html

Datanode Ports

http://www.cloudera.com/documentation/archive/manager/4-x/4-7-2/Configuring-Hadoop-Security-with-Cloudera-Manager/cmchs_enable_security_s9.html

  3) SERVICE-SPECIFIC CONFIGURATION TASKS 

On page 60 of the security doc, there are steps to kerberize Hadoop services. Make sure you did these!

MapReduce
$ sudo -u hdfs hadoop fs -chown mapred:hadoop ${mapred.system.dir}
HBase
$ sudo -u hdfs hadoop fs -chown -R hbase ${hbase.rootdir}
Hive
$ sudo -u hdfs hadoop fs -chown hive /user/hive
YARN
$ rm -rf ${yarn.nodemanager.local-dirs}/usercache/*

All of these steps EXCEPT for the YARN one can happen at any time. The step for YARN must happen after Kerberos installation because what it is doing is removing the user cache for non-kerberized YARN data. When you run mapreduce after the Kerberos install it should populate this with the Kerberized user cache data.

YARN User Cache

YARN Application exited with exitCode: -1000 Not able to initialize user directories

KERBEROS PRINCIPAL ISSUES

  1) SHORT NAME RULES MAPPING

Kerberos principals are "mapped" to the OS-level services users. For example, hdfs/WHATEVER@REALM maps to the service user 'hdfs' in your operating system only because of a name mapping rule set in the core-site of Hadoop. Without name mapping, Hadoop wouldn't know which user is authenticated by which principal.

If you are using a principal that should map to hdfs, make sure the principal name resolves correctly to hdfs according to these Hadoop rules.

Good

(has a name mapping rule by default)

  • hdfs@REALM
  • hdfs/_HOST@REALM
Bad

(no name mapping rule by default)

  • hdfs-TAG@REALM

The "bad" example will not work unless you add a rule to accommodate it

Name Rules Mapping

http://www.cloudera.com/documentation/archive/cdh/4-x/4-5-0/CDH4-Security-Guide/cdh4sg_topic_19.html )

  2) KEYTAB AND PRINCIPAL KEY VERSION NUMBERS MUST MATCH

The Key Version Number (KVNO) is the version of the key that is actively being used (as if you had a house key but then changed the lock on the door so it used a new key, the old one is no longer any good). Both the keytab and principal have a KVNO and the version number must match.

By default, when you use ktadd or xst to export the principal to a keytab, it changes the keytab version number, but does not change the KVNO of the principal. So you can end up accidentally creating a mismatch.

Use -norandkey with kadmin or kadmin.local when exporting a principal to a keytab to avoid updating the keytab number and creating a KVNO mismatch.

In general, whenever having principal issues authentication issues, make sure to check that the KVNO of the principal and keytab match:

Principal
$ kadmin.local -q 'getprinc myprincipalname'
Keytab
$ klist -kte mykeytab
Creating Principals

http://www.cloudera.com/documentation/archive/cdh/4-x/4-3-0/CDH4-Security-Guide/cdh4sg_topic_3_4.html

SECURITY JARS AND JAVA HOME

  1) JAVA VERSION MISMATCH WITH JCE JARS

Hadoop needs the Java security JCE Unlimited Strength jars installed in order to use AES-256 encryption with Kerberos. Both Hadoop and Kerberos need to have access to these jars. This is an install issue but it is easy to miss because you can think you have the security jars installed when you really don't.

JCE Configurations to Check:
  • the jars are the right version - the correct security jars are bundled with Java, but if you install them after the fact you have to make sure the version of the jars corresponds to the version of Java or you will continue to get errors. To troubleshoot, check the md5sum hash from a brand new download of the JDK that you're using in against the md5sum hash of the ones on the Kerberos server.
  • the jars are in the right location $JAVA_HOME/jre/lib/security
  • Hadoop is configured to look for them in the right place. Check if there is an export statement for $JAVA_HOME to the correct Java install location in /etc/hadoop/conf/hadoop-env.sh

If Hadoop has JAVA_HOME set incorrectly it will fail with "GSS INITIATE FAILED". If the jars are not in the right location, Kerberos won't find them and will give an error that it doesn't support the AES-256 encryption type (UNSUPPORTED ENCTYPE).

Cloudera with JCE Jars

http://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_sg_s2_jce_policy.html

Troubleshooting JCE Jars

https://community.cloudera.com/t5/Cloudera-Manager-Installation/Problem-with-Kerberos-amp-user-hdfs/td-p/6809

TICKET RENEWAL WITH JDK 6 AND MIT KERBEROS 1.8.1 AND HIGHER

Cloudera has an issue documented at http://www.cloudera.com/documentation/archive/cdh/3-x/3u6/CDH3-Security-Guide/cdh3sg_topic_14_2.html in which tickets must be renewed before hadoop commands can be issued. This only happens with Oracle JDK 6 Update 26 or earlier and package version 1.8.1 or higher of the MIT Kerberos distribution.

To check the package, do an rpm -qa | grep krb5 on CentOS/RHEL or aptitude search krb5 -F "%c %p %d %V" on Debian/Ubuntu.

So do a regular kinit as you would, then do a kinit -R to force the ticket to be renewed.

$ kinit -kt mykeytab myprincipal$ kinit -R

And finally, the issue I actually had which I could not find documented anywhere ...

CONFIGURATION FILES AND TICKET CACHING

There are two important configuration files for Kerberos, the krb5.conf and the kdc.conf. These are configurations for the krb5kdc service and the KDC database. My problem was the krb5.conf file had a property:default_ccache_name = KEYRING:persistent:%{uid}.

This set my cache name to KEYRING:persistent and user uid (explained https://web.mit.edu/kerberos/krb5-1.13/doc/basic/ccache_def.html). When I did a kinit, it created the ticket in /tmp because the cache name was being set elsewhere as /tmp. Cloudera services obtain authentication with files generated at runtime in /var/run/cloudera-scm-agent/process , and these all export the cache name environment variable (KRB5CCNAME) before doing their kinit. That's why Cloudera could obtain tickets but my hadoop user couldn't.

The solution was to remove the line from krb5.conf that set default_ccache_name and allow kinit to store credentials in /tmp, which is the MIT Kerberos default value DEFCCNAME (documented at https://web.mit.edu/kerberos/krb5-1.13/doc/mitK5defaults.html#paths).

Cloudera and Kerberos installation guides:

Step-by-Step

https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_sg_intro_kerb.html .

Advanced troubleshooting

http://www.cloudera.com/documentation/enterprise/5-6-x/PDF/cloudera-security.pdf, starting on page 48 .