Data ingestion in Hadoop using Distcp Data ingestion in Hadoop using Distcp hadoop hadoop

Data ingestion in Hadoop using Distcp


Distcp is a mapreduce job that is executed inside the hadoop cluster. For hadoop cluster perspective, your local machine is not a local file system. Then you can't use your local file sytem with distcp. An alternative could be configure a FTP server in your machine that hadoop cluster can read. The performance depends on the network and the protocol used (ftp with hadoop has a very bad performance).

Use hdfs dfs -put command could be better for small amount of data but it isn't work in parallel like distcp.