Why doesn't requests.get() return? What is the default timeout that requests.get() uses?
What is the default timeout that get uses?
The default timeout is None
, which means it'll wait (hang) until the connection is closed.
Just specify a timeout value, like this:
r = requests.get( 'http://www.justdial.com', proxies={'http': '222.255.169.74:8080'}, timeout=5)
From requests documentation:
You can tell Requests to stop waiting for a response after a given number of seconds with the timeout parameter:
>>> requests.get('http://github.com', timeout=0.001)Traceback (most recent call last): File "<stdin>", line 1, in <module>requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)
Note:
timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds).
It happens a lot to me that requests.get() takes a very long time to return even if the timeout
is 1 second. There are a few way to overcome this problem:
1. Use the TimeoutSauce
internal class
From: https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896
import requests from requests.adapters import TimeoutSauceclass MyTimeout(TimeoutSauce): def __init__(self, *args, **kwargs): if kwargs['connect'] is None: kwargs['connect'] = 5 if kwargs['read'] is None: kwargs['read'] = 5 super(MyTimeout, self).__init__(*args, **kwargs)requests.adapters.TimeoutSauce = MyTimeout
This code should cause us to set the read timeout as equal to the connect timeout, which is the timeout value you pass on your Session.get() call. (Note that I haven't actually tested this code, so it may need some quick debugging, I just wrote it straight into the GitHub window.)
2. Use a fork of requests from kevinburke: https://github.com/kevinburke/requests/tree/connect-timeout
From its documentation: https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst
If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:
r = requests.get('https://github.com', timeout=(3.05, 27))
NOTE: The change has since been merged to the main Requests project.
3. Using evenlet
or signal
as already mentioned in the similar question:Timeout for python requests.get entire response
I wanted a default timeout easily added to a bunch of code (assuming that timeout solves your problem)
This is the solution I picked up from a ticket submitted to the repository for Requests.
credit: https://github.com/kennethreitz/requests/issues/2011#issuecomment-477784399
The solution is the last couple of lines here, but I show more code for better context. I like to use a session for retry behaviour.
import requestsimport functoolsfrom requests.adapters import HTTPAdapter,Retrydef requests_retry_session( retries=10, backoff_factor=2, status_forcelist=(500, 502, 503, 504), session=None, ) -> requests.Session: session = session or requests.Session() retry = Retry( total=retries, read=retries, connect=retries, backoff_factor=backoff_factor, status_forcelist=status_forcelist, ) adapter = HTTPAdapter(max_retries=retry) session.mount('http://', adapter) session.mount('https://', adapter) # set default timeout for method in ('get', 'options', 'head', 'post', 'put', 'patch', 'delete'): setattr(session, method, functools.partial(getattr(session, method), timeout=30)) return session
then you can do something like this:
requests_session = requests_retry_session()r = requests_session.get(url=url,...