Which nodejs library should I use to write into HDFS? Which nodejs library should I use to write into HDFS? hadoop hadoop

Which nodejs library should I use to write into HDFS?


You may want to check out webhdfs library. It provides nice and straightforward (similar to fs module API) interface for WebHDFS REST API calls.

Writing to the remote file:

var WebHDFS = require('webhdfs');var hdfs = WebHDFS.createClient();var localFileStream = fs.createReadStream('/path/to/local/file');var remoteFileStream = hdfs.createWriteStream('/path/to/remote/file');localFileStream.pipe(remoteFileStream);remoteFileStream.on('error', function onError (err) {  // Do something with the error});remoteFileStream.on('finish', function onFinish () {  // Upload is done});

Reading from the remote file:

var WebHDFS = require('webhdfs');var hdfs = WebHDFS.createClient();var remoteFileStream = hdfs.createReadStream('/path/to/remote/file');remoteFileStream.on('error', function onError (err) {  // Do something with the error});remoteFileStream.on('data', function onChunk (chunk) {  // Do something with the data chunk});remoteFileStream.on('finish', function onFinish () {  // Upload is done});


Not good news!!!

Do not use node-hdfs. Although it seems promising, it is now two years obsolete. I've tried to compile it but it does not match the symbols of current libhdfs. If you want to use something like that you'll have to make your own nodejs binding.

You can use node-webhdfs but IMHO there's not much advantage on that. It is better to use an http nodejs lib to make your own requests. The hardest part here is try to hold the very async nature of nodejs, since you might want first to create a folder, and then after successfully create it, create a file and then, at last, write or append data. Everything through http requests that you must send and wait the for answer to then go on....

At least node-webhdfs might be a good reference to you take a look and start your own code.

Br,Fabio Moreira