Reading Files in HDFS (Hadoop filesystem) directories into a Pandas dataframe
It looks like the pydoop.hdfs module solves this problem while meeting a good set of the goals:
http://pydoop.sourceforge.net/docs/tutorial/hdfs_api.html
I was not not able to evaluate this, as pydoop has very strict requirements to compile and my Hadoop version is a bit dated.