aiohttp: set maximum number of requests per second

Since v2.0, when using a ClientSession, aiohttp automatically limits the number of simultaneous connections to 100.

You can modify the limit by creating your own TCPConnector and passing it into the ClientSession. For instance, to create a client limited to 50 simultaneous requests:

import aiohttpconnector = aiohttp.TCPConnector(limit=50)client = aiohttp.ClientSession(connector=connector)

In case it's better suited to your use case, there is also a limit_per_host parameter (which is off by default) that you can pass to limit the number of simultaneous connections to the same "endpoint". Per the docs:

limit_per_host (int) – limit for simultaneous connections to the same endpoint. Endpoints are the same if they are have equal (host, port, is_ssl) triple.

Example usage:

import aiohttpconnector = aiohttp.TCPConnector(limit_per_host=50)client = aiohttp.ClientSession(connector=connector)

python python-asyncio aiohttp

I found one possible solution here: http://compiletoi.net/fast-scraping-in-python-with-asyncio.html

Doing 3 requests at the same time is cool, doing 5000, however, is not so nice. If you try to do too many requests at the same time, connections might start to get closed, or you might even get banned from the website.
To avoid this, you can use a semaphore. It is a synchronization tool that can be used to limit the number of coroutines that do something at some point. We'll just create the semaphore before creating the loop, passing as an argument the number of simultaneous requests we want to allow:

sem = asyncio.Semaphore(5)

Then, we just replace:

page = yield from get(url, compress=True)

by the same thing, but protected by a semaphore:

with (yield from sem):    page = yield from get(url, compress=True)

This will ensure that at most 5 requests can be done at the same time.

python python-asyncio aiohttp

You could set a delay per request or group the URLs in batches and throttle the batches to meet desired frequency.

1. Delay per request

Force the script to wait in between requests using asyncio.sleep

import asyncioimport aiohttpdelay_per_request = 0.5urls = [   # put some URLs here...]async def app():    tasks = []    for url in urls:        tasks.append(asyncio.ensure_future(make_request(url)))        await asyncio.sleep(delay_per_request)    results = await asyncio.gather(*tasks)    return resultsasync def make_request(url):    print('$$$ making request')    async with aiohttp.ClientSession() as sess:        async with sess.get(url) as resp:            status = resp.status            text = await resp.text()            print('### got page data')            return url, status, text

This can be run with e.g. results = asyncio.run(app()).

2. Batch throttle

Using make_request from above, you can request and throttle batches of URLs like this:

import asyncioimport aiohttpimport timemax_requests_per_second = 0.5urls = [[   # put a few URLs here...],[   # put a few more URLs here...]]async def app():    results = []    for i, batch in enumerate(urls):        t_0 = time.time()        print(f'batch {i}')        tasks = [asyncio.ensure_future(make_request(url)) for url in batch]        for t in tasks:            d = await t            results.append(d)        t_1 = time.time()        # Throttle requests        batch_time = (t_1 - t_0)        batch_size = len(batch)        wait_time = (batch_size / max_requests_per_second) - batch_time        if wait_time > 0:            print(f'Too fast! Waiting {wait_time} seconds')            time.sleep(wait_time)    return results

Again, this can be run with asyncio.run(app()).

CodeHunter

aiohttp: set maximum number of requests per second

1. Delay per request

2. Batch throttle

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last