How will the free space of a block in a data node be utilized by hdfs in hadoop?

hadoop hdfs block

Logically, if you files are smaller than block size than HDFS will reduce the block size for that particular files to the size of file. So HDFS will only use 10MB for storing 10MB of small files.It will not waste 54MB or leave it blank.

Small file sin HDFS are desribed in detail here : http://blog.cloudera.com/blog/2009/02/the-small-files-problem/

hadoop hdfs block

The remaining 54MB would be utilized for some other file. So this is how it works, assume you do a put or copyFromLocal of 2 small files each with size 20MB and your block size is 64MB. Now HDFS calculates the available space (suppose previously you have saved a file of 10 MB in a 64MB block it includes these remaining 54MB as well)in the filesystem(not available blocks) and gives a report in terms of block. Since you have 2 files, with replication factor as 3, so a total of 6 blocks would be allocated for your files even if your file size is less than the block size. If the cluster doesn't have 6 blocks(6*64MB) then the put process would fail. Since the report is fetched in terms of space not in terms of blocks, you would never run out of blocks. The only time files are measured in blocks is at block allocation time.

Read this blog for more information.

CodeHunter

How will the free space of a block in a data node be utilized by hdfs in hadoop?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last