Deleting files from HDFS does not free up disk space Deleting files from HDFS does not free up disk space hadoop hadoop

Deleting files from HDFS does not free up disk space


I found a similar issue on our cluster, which stemmed probably from a failed upgrade.

First make sure to finalize the upgrade on the namenode

hdfs dfsadmin -finalizeUpgrade

What I found was that the datanodes for some reason did not finalize their directories at all.

On your datanode, you should see the following directory layout

/[mountpoint}/dfs/dn/current/{blockpool}/current

And

/[mountpoint}/dfs/dn/current/{blockpool}/previous

If you have not finalized the previous directory contains all data that was created before the update. If you delete anything it will not remove it - hence your storage never reduces.

Actually the most simplest solution was sufficient

Restart the namenode

Watch the log of the datanode, you should see something like this

INFO org.apache.hadoop.hdfs.server.common.Storage: Finalizing upgrade for storage directory

Afterwards the directories will be cleared in the background and the storage reclaimed.