How to specifically determine input for each map step in MRJob?
You can use Runners
You will have to define the jobs separately and use another python script to invoke it.
from NumLines import NumLinesfrom WordsPerLine import WordsPerLineimport sysintermediate = Nonedef firstJob(input_file): global intermediate mr_job = NumLines(args=[input_file]) with mr_job.make_runner() as runner: runner.run() intermediate = runner.get_output_dir()def secondJob(input_file): mr_job = WordsPerLine(args=[intermediate,input_file]) with mr_job.make_runner() as runner: runner.run()if __name__ == '__main__': firstJob(sys.argv[1]) secondJob(sys.argv[1])
and can be invoked by:
python main_script.py input.txt