wget: obtaining files matching regex
Be careful --accept-regex
is for the complete URL. But our target is some specific files. So we will use -A
.
For example,
wget -r -np -nH -A "IMG[012][0-9].jpg" http://x.com/y/z/
will download all the files from IMG00.jpg to IMG29.jpg from the URL.
Note that a matching pattern contains shell-like wildcards, e.g. ‘books’ or ‘zelazny196[0-9]*’.
reference:wget manual: https://www.gnu.org/software/wget/manual/wget.htmlregex: https://regexone.com/
I'm reading in wget
man page:
--accept-regex urlregex --reject-regex urlregex Specify a regular expression to accept or reject the complete URL.
and noticing that it mentions the complete URL (e.g. something like
ftp://ftp.fu-berlin.de/pub/misc/movies/database/diffs/diffs-000121.tar.gz
)
So I suggest (without having tried it) to use --accept-regex='.*diffs\-0001[0-9][0-9]\.tar\.gz'
(and perhaps give the appropriate --regex-type
too)
BTW, for such tasks, I would also consider using some scripting language à la Python (or use libcurl or curl
)