how to connect to CDH cluster from Remote python service how to connect to CDH cluster from Remote python service hadoop hadoop

how to connect to CDH cluster from Remote python service


Apache Hadoop provides WebHDFS, which is an HTTP interface into HDFS operations. This allows you to manipulate files in HDFS using any Python HTTP client library such as httplib, urllib or urllib2. In fact, you can access WebHDFS using any programming language that provides an HTTP client library.

You could also use Pydoop, which provides a more direct integration between Python and HDFS. The Pydoop implementation uses LibHDFS, which is a C wrapper over the standard HDFS Java client. Thus, it would utilize the HDFS RPC protocol directly instead of HTTP.