Python Requests getting ('Connection aborted.', BadStatusLine("''",)) error

python python-3.x python-requests

The error you get indicates the host isn't responding in the expected manner. In this case, it's because it detects that you're trying to scrape it and deliberately disconnecting you.

If you try your requests code with this URL from a test website: http://mirror.internode.on.net/pub/test/5meg.test1, you'll see that it downloads normally.

To get around this, fake your user agent. Your user agent identifies your web browser, and web hosts commonly check it to detect bots.

Use the headers field to set your user agent. Here's an example which tells the webhost you're Firefox.

headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0' }r = requests.get(url, headers=headers)

There are lots of other discrepancies¹ between bots and human-operated browsers that web hosts can check for, but user agent is one of the easiest and common ones.

If you want your scraper to be harder to detect, you'll want to use a headless browser like headless Chrome² (or ghost.py if you want to stick with Python), which you can trust will behave like a real browser (because it is!).

_Footnotes:

_{¹Possible other checks include checks for if images aren't being downloaded, page resources aren't downloaded in the normal order, pages being downloaded faster than a human can read them, and cookies not being set properly. Google flags mouse movements deemed insufficiently human-like.}

_{²Headless Chrome is the most competent headless browser in 2018, but if its weight is a problem for you, its slightly-outdated predecessors, PhantomJS and ghost.py, are lighter weight and still usable.}

python python-3.x python-requests

try this:

headers = {    'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0',    'ACCEPT' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',    'ACCEPT-ENCODING' : 'gzip, deflate, br',    'ACCEPT-LANGUAGE' : 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7',    'REFERER' : 'https://www.google.com/'}    r = requests.get("http://yourdomain.com/", headers=headers)

python python-3.x python-requests

In my case, i must remove the user agent fields from headers

url='https://...'headers = {}requests.get(url, headers=headers)

once i set 'User-Agent', it getting ('Connection aborted.', BadStatusLine("''",))and this error occurs only with the individual site.my first post,i get many helps from this site, hope it can help others who find here

CodeHunter

Python Requests getting ('Connection aborted.', BadStatusLine("''",)) error

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last