Do blocks in HDFS have byte-offset information stored in Hadoop? Do blocks in HDFS have byte-offset information stored in Hadoop? hadoop hadoop

Do blocks in HDFS have byte-offset information stored in Hadoop?


Disclaimer: I might be wrong on this one I have not read that much of the HDFS source code.

Basically, datanodes manage blocks which are just large blobs to them. They know the block id but that its. The namenode knows everything, especially the mapping between a file path and all the block ids of this file and where each block is stored. Each block id can be stored in one or more locations depending of its replication settings.

I don't think you will find public API to get the information you want from a block id because HDFS does not need to do the mapping this way. On the opposite you can easily know the blocks and their locations of a file. You can try explore the source code, especially the blockmanager package.

If you want to learn more, this article about the HDFS architecture could be a good start.


You can run hdfs fsck /path/to/file -files -blocks to get the list of blocks.

A Block does not contain offset info, only length. But you can use LocatedBlocks to get all blocks of a file and from this you can easily reconstruct each block what offset it starts at.