Get RSS via cURL, fine in browser but 404 error in terminal Get RSS via cURL, fine in browser but 404 error in terminal curl curl

Get RSS via cURL, fine in browser but 404 error in terminal


Indeed, the curl returns a 404 status page:

$ curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php -s -o /dev/null -D-HTTP/1.1 **404 Not Found**Date: Tue, 04 Mar 2014 08:12:27 GMTServer: ApacheX-Pingback: http://mediosymedia.com/xmlrpc.phpExpires: Wed, 11 Jan 1984 05:00:00 GMTCache-Control: no-cache, must-revalidate, max-age=0Pragma: no-cacheTransfer-Encoding: chunkedContent-Type: text/html; charset=UTF-8 

Many webservers will be suspicious of requests without a browser User-Agent because they expect curl to be used for scraping. This is probably not the smartest technique because a simple UserAgent spoofing will fix that problem:

$ curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php -s -o /dev/null -D- -H'User-Agent:  Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:27.0) Gecko/20100101 Firefox/27.0'HTTP/1.1 **200 OK**Date: Tue, 04 Mar 2014 08:13:46 GMTServer: ApacheExpires: Wed, 11 Jan 1984 05:00:00 GMTCache-Control: no-cache, must-revalidate, max-age=0Pragma: no-cacheTransfer-Encoding: chunkedContent-Type: text/xml;charset=utf-8

So, in practice, make sure you set up a User-Agent for your requests that is not Curl's.


My initial though was that this may be related to cookies (see this question), but this may be a localized issue. This is working fine from my machine:

[root@devtest tmp]# curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php > temp.xml  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed100 27926    0 27926    0     0  54564      0 --:--:-- --:--:-- --:--:-- 69815

CORRECTION:

Thanks to Julien for pointing out that the contents of the downloaded file was the custom 404 page contents. As he mentions, you need to add a useragent flag (-A) to your curl requests:

# curl -A "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"-g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php > temp.xml

I would just delete my answer, but it's worth leaving up as a warning to others who might be experiencing this issue - make sure you validate the response!