How can I iterately process all files under one directory using mrjob

python hadoop mrjob

Well, finally I find that I can specify a directory as the input path and Hadoop will process all files in that directory.

Further in my case, I have sub-directories containing the input files. Hadoop will not transverse directory recursively and will raise error by default. A common trick is to use wildcard glob like

python count.py hdfs://master-host/directory/*/*.txt > result

CodeHunter

How can I iterately process all files under one directory using mrjob

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last