406 when using Guzzle but not through browser, or command line cURL or wget

I'm afraid I don't have a proper solution to your problem, but I have it working again.

tl;dr version

It's the User-Agent header, changing it to pretty much anything else works.

This wget call fails:

wget -d --header="User-Agent: Mozilla/4.0"  https://www.socialquant.net/blog/feed/

but this works

wget -d --header="User-Agent: SomeRandomText" https://www.socialquant.net/blog/feed/

And with that, the PHP below now also works:

require 'vendor/autoload.php';$client = new \GuzzleHttp\Client();$feed = 'https://www.socialquant.net/blog/feed/';try {    $res = $client->get(         $feed,         [            'headers' => [                'User-Agent' => 'SomeRandomText',            ]        ]    );    echo $res->getBody();} catch (\Exception $e) {    echo 'Exception: ' . $e->getMessage();}

My thoughts

I started with wget and curl as you pointed out, which works when no special headers or options are set. Opening it in my browser also worked. I also tried using Guzzle without the User-Agent set and that also works.

Once I set the User-Agent to Mozilla/4.0 or even Mozilla/5.0 it started failing with 406 Not Acceptable

According to the HTTP Status Code definitions, a 406 means

The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request.

In theory, adding Accept and Accept-Encoding headers should resolve the issue, but it didn't. Not via Guzzle or wget.

I then found the Mozilla Developer Network definition which states:

This response is sent when the web server, after performing server-driven content negotiation, doesn't find any content following the criteria given by the user agent.

This kinda points at the User-Agent again. This led me to believe that you are indeed correct that mod_security is doing something odd. I am convinced that an update to mod_security or Apache on the client's servers added a rule to parse the Mozilla/* user agents in a specific way since sending the User-Agent: Mozilla/4.0 () also works.

That's why I'm saying I don't have a proper solution for you. Even though the client wants you to pull the feed, they (or their hosting) is still in control of the rules.

Note: I noticed my IP getting blacklisted after a number of failed 406 attempts, after which I had to wait an hour before I could access the site again. Most likely a mod_security rule. mod_security might even be picking up on the automated requests with your user agent and start blocking it or rejecting it with the 406.

php curl rss guzzle

I don't have a solution for you either, as I'm also experiencing this same issue (except I get error 503 and it fails 60% of the time). Let me know if you have found a solution.

However, I would like to share with you what I have found through my recent research. I found that certain User-Agents work better than others for me. This makes me believe that it's not what Donovan states to be the case (at least for me).

When I set User-Agent to null, it works 100% of the time. However, I haven't made any large requests yet, as I'm afraid of getting IP banned, as I know I would with a large request.

When I do a var_dump of the request itself, I see a lot of arrays which include Guzzle markers. I'm thinking, maybe Amazons detection services can tell that I'm spoofing the headers? I don't know.

Hope you figured it out.

CodeHunter

406 when using Guzzle but not through browser, or command line cURL or wget

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last