HDFS block size and its relationship with underlying physical file-system block size HDFS block size and its relationship with underlying physical file-system block size hadoop hadoop

HDFS block size and its relationship with underlying physical file-system block size


There's no connection between the two at all. The 128MB block size in HDFS just means that HDFS doesn't produce files bigger than 128MB. When it needs to store a larger amount of data, it divides it into several files. But the 128MB files created by HDFS are no different than 128MB files created by any other program.

You're correct that having lots of 4k blocks scattered all over the disk can lead to lots of disk seeks when accessing the file. To avoid that, when the operating system allocates space on disk for a file – any file, not just one created by HDFS – it tries to choose blocks that are adjacent to each other, so that the disk can seek once and then read or write all the blocks together.

For more information, read about disk fragmentation.