Get modified date of file on web site without using the Last-Modified header value Get modified date of file on web site without using the Last-Modified header value shell shell

Get modified date of file on web site without using the Last-Modified header value


As others have mentioned here, it's tough to trust the Last-Modified header on when the file was last updated.

If you don't mind downloading the full contents of the file, you could store the md5 hash of the file. If it's different on subsequent calls, you know the contents of the file changed.

From Bash shell, you could do:

curl -s www.google.com | md5

Using the excellent python Requests library:

import requestsimport hashlibr = requests.get('http://www.example.com')hash = hashlib.md5(r.text).hexdigest()


If you're retrieving data over http, there is no guarantee that what you're requesting corresponds to a physical file or anything else with a concept of a "last modified" date, so within the http protocol there's no way (other than Last-Modified) to know. You will probably want to retrieve the file if you don't have a sufficiently recent local copy - and you will have to decide for your purposes what "sufficiently recent" is.

If you have a user account on the host and can remotely log in via ssh or similar, it may be possible to inspect an actual file for mod date.


As I see it, you are basically maintaining a cache. HTTP has more than just the Last-Modified header to facilitate caching, but the logic not all that simple. W3C has a discussion of how to implement a cache that you may find helpful.