How to Test HDFS I/O Throughput How to Test HDFS I/O Throughput hadoop hadoop

How to Test HDFS I/O Throughput


File copy time is affected by many factors, some of them include 1) file size, 2) network latency and transfer speeds, 3) hard drive seek and read/write times, 4) hdfs replication amount.

When you are working with small files (and your 5mb through 50mb are small files) the latency and seek times give you a lower bound on the copy time, then on top of that you have the transfer speed and read/write times. Essentially, don't expect to see a linear time increase unless you start working with significanly larger files. The HDFS filesystem is based around large blocks, I think the default is 64MB and often people put that up to 512MB or larger.

For testing the io times try using these, TestDFSIO, and testfilesystem. They are found in the hadoop hadoop-mapreduce-client-jobclient-*.jar