Is there a way to set a TTL for certain directories in HDFS?
This feature is not yet available on HDFS.
There was a JIRA ticket created to support this feature: https://issues.apache.org/jira/browse/HDFS-6382
But, the fix is not yet available.
You need to handle it using a cron job. You can create a job (this could be a simple Shell, Perl or Python script), which periodically deletes the data older than a certain pre-configured period.
This job could:
- Run periodically (For e.g. once an hour or once a day)
- Take the list of folders or files which need to be checked, along with their TTL as input
- Delete any file or folder, which is older than the specified TTL.
This can be achieved easily, using scripting.