wget: obtaining files matching regex wget: obtaining files matching regex unix unix

wget: obtaining files matching regex


Be careful --accept-regex is for the complete URL. But our target is some specific files. So we will use -A.

For example,

wget -r -np -nH -A "IMG[012][0-9].jpg" http://x.com/y/z/ 

will download all the files from IMG00.jpg to IMG29.jpg from the URL.

Note that a matching pattern contains shell-like wildcards, e.g. ‘books’ or ‘zelazny196[0-9]*’.

reference:wget manual: https://www.gnu.org/software/wget/manual/wget.htmlregex: https://regexone.com/


I'm reading in wget man page:

  --accept-regex urlregex  --reject-regex urlregex       Specify a regular expression to accept or reject the complete URL.

and noticing that it mentions the complete URL (e.g. something like
ftp://ftp.fu-berlin.de/pub/misc/movies/database/diffs/diffs-000121.tar.gz)

So I suggest (without having tried it) to use
--accept-regex='.*diffs\-0001[0-9][0-9]\.tar\.gz'

(and perhaps give the appropriate --regex-type too)

BTW, for such tasks, I would also consider using some scripting language à la Python (or use libcurl or curl)