Get input file name in streaming hadoop program
According to the "Hadoop : The Definitive Guide"
Hadoop sets job configuration parameters as environment variables for Streaming programs. However, it replaces non-alphanumeric character with underscores to make sure they are valid names. The following Python expression illustrates how you can retrieve the value of the mapred.job.id property from within a Python Streaming script:
os.environ["mapred_job_id"]
You can also set environment variables for the Streaming process launched by MapReduce by applying the -cmdenv option to the Streaming launcher program (once for each variable you wish to set). For example, the following sets the MAGIC_PARAMETER environment variable:
-cmdenv MAGIC_PARAMETER=abracadabra
By parsing the mapreduce_map_input_file
(new) or (deprecated) environment variable, you will get the map input file name. map_input_file
Notice:
The two environment variables are case-sensitive, all letters are lower-case.
The new ENV_VARIABLE for Hadoop 2.x is MAPREDUCE_MAP_INPUT_FILE