R+Hadoop: How to read CSV file from HDFS and execute mapreduce? R+Hadoop: How to read CSV file from HDFS and execute mapreduce? hadoop hadoop

R+Hadoop: How to read CSV file from HDFS and execute mapreduce?


mapreduce(input = path, input.format = make.input.format(...), map ...)

from.dfs is for small data. In most cases you won't use from.dfs in the map function. The arguments hold a portion of the input data already


You can do something like below:

r.file <- hdfs.file(hdfsFilePath,"r")from.dfs(    mapreduce(         input = as.matrix(hdfs.read.text.file(r.file)),         input.format = "csv",         map = ...))

Please give points and hope anybody find it useful.

Note: For details refer to the stackoverflow post :

How to input HDFS file into R mapreduce for processing and get the result into HDFS file