Scrapy VS wget VS curl Scrapy VS wget VS curl curl curl

Scrapy VS wget VS curl


Wget can do this.

See: http://www.linuxjournal.com/content/downloading-entire-web-site-wget

Basically

$ wget \ --recursive \ --no-clobber \ --page-requisites \ --html-extension \ --convert-links \ --restrict-file-names=windows \ --domains website.org \ --no-parent \     www.website.org/tutorials/html/

--recursive should save links

--page-requisites should save css, images, etc.


1.wget can do this but it can be done easily by using other tools

wget -m -k -K -E -p http://url/of/web/site

-p is for downloading assets.Wait options -w 10 --random-wait can be added if you are scraping some 3rd party websites.

2.HTTRACK is an effective way of copying the contents of an entire site. This tool is able to fetch the pieces necessary to make a website with working code content work offline.

  1. WebCopier on Windows.