Is there a way to make selenium work asynchronously?

python-3.x selenium asynchronous web-scraping thread-safety

One reason this script takes much time is because of the high number of requests it sends, One way for it to consume less time is to limit the number of requests. This could be achieved by using their internal API to get the profile data. For example: sending a get request to this link https://www.khanacademy.org/api/internal/user/discussion/summary?username=user_name&lang=en (and changing user_name to the actual username) will return the profile data you require (and more) as JSON instead of having to scrape many sources. You can then extract the data from the JSON output, and convert them to CSV. You will need to use selenium to only get the discussion data and find the list of usernames. This would cut script running time greatly.

Side note: even the modules links could be extracted using the JS variable that is parsed when scraping the main URL. The variable contains JSON that stores course data including links.

Here's the code that does that:

import requestsimport bs4import jsonURL = "https://www.khanacademy.org/computing/computer-programming/programming#intro-to-programming"BASE_URL = "https://www.khanacademy.org"response = requests.get(URL)soup = bs4.BeautifulSoup(response.content, 'lxml')script = soup.find_all('script')[18]script = script.text.encode('utf-8')script = unicode(script, errors='ignore').encode('utf-8').strip()script = script.split('{window["./javascript/app-shell-package/app-entry.js"] = ')[1]script = script[:-2]json_content = json.loads(script)

You can extract the modules links from that JSON, and query them instead.

CodeHunter

Is there a way to make selenium work asynchronously?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last