Is there a way to set a TTL for certain directories in HDFS? Is there a way to set a TTL for certain directories in HDFS? hadoop hadoop

Is there a way to set a TTL for certain directories in HDFS?


This feature is not yet available on HDFS.

There was a JIRA ticket created to support this feature: https://issues.apache.org/jira/browse/HDFS-6382

But, the fix is not yet available.

You need to handle it using a cron job. You can create a job (this could be a simple Shell, Perl or Python script), which periodically deletes the data older than a certain pre-configured period.

This job could:

  • Run periodically (For e.g. once an hour or once a day)
  • Take the list of folders or files which need to be checked, along with their TTL as input
  • Delete any file or folder, which is older than the specified TTL.

This can be achieved easily, using scripting.