Reading Files in HDFS (Hadoop filesystem) directories into a Pandas dataframe Reading Files in HDFS (Hadoop filesystem) directories into a Pandas dataframe hadoop hadoop

Reading Files in HDFS (Hadoop filesystem) directories into a Pandas dataframe


It looks like the pydoop.hdfs module solves this problem while meeting a good set of the goals:

http://pydoop.sourceforge.net/docs/tutorial/hdfs_api.html

I was not not able to evaluate this, as pydoop has very strict requirements to compile and my Hadoop version is a bit dated.