How do I Translate XML to TSV Using Hadoop? How do I Translate XML to TSV Using Hadoop? hadoop hadoop

How do I Translate XML to TSV Using Hadoop?


If you have this problem, the folks from Infochimps have solved it. Here's the necessary Wukong script:

http://thedatachef.blogspot.com/2011/01/processing-xml-records-with-hadoop-and.html


One often mistake is not having execute permissions on your script "chmod a+x mapper.rb" give that a try.

Take a look in your job tracker logs to get the error specifically. You can also get the info from the http://namenode:50030/jobtracker.jsp click on the failed job and then on the "Failed" in the "Failed/Killed Task Attempts" for the map.

Also when you run your stream job put "-verbose" on the option line that might give some more information.