How can we develop a proxy for HDFS ( Hadoop Distributed File System) How can we develop a proxy for HDFS ( Hadoop Distributed File System) hadoop hadoop

How can we develop a proxy for HDFS ( Hadoop Distributed File System)


Apache Knox may or may not be what you are looking for. Arnon's answer above doesn't have the correct url though. Please see: http://knox.apache.org/

We do not have file level ACLs built in to the provided authorization provider but you can create a custom provider and plug that in.

Keep in mind that Knox is a proxy for WebHDFS access and does not come into play for accessing files directly through HDFS.

Therefore any authorization checks that are done at the gateway will not be done when accessing the same files directly. This is why we generally do service level authorization checks at the gateway and leave fine grained authorization checks to be done at the resource itself.

Hope this is useful for you.


external (perimeter) security like you mention for WebHDFS is one thing. You can extend that for submitting jobs etc. (in fact it has already been done see apache knox)

The other this is not a proxy but rather an alternate implementation of FileSystem class. Is has also been implemented several times - You can see more information here