Hadoop Configuration on Windows through Cygwin Hadoop Configuration on Windows through Cygwin hadoop hadoop

Hadoop Configuration on Windows through Cygwin


Quick summary:

  • The hadoop bash script under (path)/bin/hadoop actually has a bug in it. The script assumes that none of the files / paths that hadoop needs will have spaces in them. Well, for anything Windows, they will all have a space somewhere, since "Program Files" has a space in it.

Details

This is a tricky one... I ran into the same problem and it took me a while to fix.

First, the problem: setting environment variables via scripts can get sketchy when spaces are involved in the file paths / names (which occurs fairly often in non-*nix systems these days).

Next, there are likely two places where you need to fix the problem:

  1. In your (path)/conf/hadoop-env.sh script, you should be setting the JAVA_HOME script, and it SHOULD look something like:

    export JAVA_HOME=/cygdrive/c/"Program Files"/Java/jdk1.7.0_06

    (Note that there are quotation marks around the "Program Files", so that it is recognized as a single element. You cannot use the \ escape character because cygwin does some finagling of Windows to UNIX paths, so the \ cannot act as escape.

  2. In your (path)/bin/hadoop script, line 320 is likely written something like the following:

    JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} -Xmx32m ${HADOOP_JAVA_PLATFORM_OPTS} org.apache.hadoop.util.PlatformName | sed -e "s/ /_/g"`

    You will need to change it to instead say:

    JAVA_PLATFORM=`CLASSPATH="${CLASSPATH}" "${JAVA}" -Xmx32m ${HADOOP_JAVA_PLATFORM_OPTS} org.apache.hadoop.util.PlatformName | sed -e "s/ /_/g"`

    Note that I have added quotation marks around the environment variables ${CLASSPATH} and ${JAVA}. By putting the quotation marks around it, you are saying that "the entire set of characters specified by this variable should be considered one string object".


OK, now if you care to understand why this is happening and what's going on, the problem is that your JDK is likely stored under "Program Files", or maybe under "Program Files (x86)", both of which have spaces within the path. All the other environment variables that Hadoop needs are not dependent upon anything within the "Program Files" pathway. So that's why you only see the one error being flagged. All the other environment variables which are missing the quotes simply don't have spaces within them.


These are fragments, the error was: hadoop.util.Platform command not found

  • "CLASSPATH=cygpath -p "$CLASSPATH" distribution, produced the error
  • "CLASSPATH=cygpath -p -w "$CLASSPATH" added windows flag, produced the error
  • "CLASSPATH=cygpath -wp "$CLASSPATH" problem resolved

This was on Vista.