How to use a file in a hadoop streaming job using python? How to use a file in a hadoop streaming job using python? hadoop hadoop

How to use a file in a hadoop streaming job using python?


hadoop jar contrib/streaming/hadoop-streaming-1.1.1.jar -file ./mapper.py \  -mapper ./mapper.py -file ./reducer.py -reducer ./reducer.py \  -input test/input.txt  -output test/output -file '../user_ids'

Does ../user_ids exist on your local file path when you execute the job? If it does then you need to amend your mapper code to account for the fact that this file will be available in the local working directory of the mapper at runtime:

f = open('user_ids','r')


Try giving full path of the file or While executing hadoop command make sure you are in the same directory in which the file user_ids file is present