Get RSS via cURL, fine in browser but 404 error in terminal
Indeed, the curl
returns a 404 status page:
$ curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php -s -o /dev/null -D-HTTP/1.1 **404 Not Found**Date: Tue, 04 Mar 2014 08:12:27 GMTServer: ApacheX-Pingback: http://mediosymedia.com/xmlrpc.phpExpires: Wed, 11 Jan 1984 05:00:00 GMTCache-Control: no-cache, must-revalidate, max-age=0Pragma: no-cacheTransfer-Encoding: chunkedContent-Type: text/html; charset=UTF-8
Many webservers will be suspicious of requests without a browser User-Agent
because they expect curl
to be used for scraping. This is probably not the smartest technique because a simple UserAgent spoofing will fix that problem:
$ curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php -s -o /dev/null -D- -H'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:27.0) Gecko/20100101 Firefox/27.0'HTTP/1.1 **200 OK**Date: Tue, 04 Mar 2014 08:13:46 GMTServer: ApacheExpires: Wed, 11 Jan 1984 05:00:00 GMTCache-Control: no-cache, must-revalidate, max-age=0Pragma: no-cacheTransfer-Encoding: chunkedContent-Type: text/xml;charset=utf-8
So, in practice, make sure you set up a User-Agent for your requests that is not Curl's.
My initial though was that this may be related to cookies (see this question), but this may be a localized issue. This is working fine from my machine:
[root@devtest tmp]# curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php > temp.xml % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 27926 0 27926 0 0 54564 0 --:--:-- --:--:-- --:--:-- 69815
CORRECTION:
Thanks to Julien for pointing out that the contents of the downloaded file was the custom 404 page contents. As he mentions, you need to add a useragent flag (-A
) to your curl
requests:
# curl -A "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"-g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php > temp.xml
I would just delete my answer, but it's worth leaving up as a warning to others who might be experiencing this issue - make sure you validate the response!