Is there a way to make selenium work asynchronously? Is there a way to make selenium work asynchronously? selenium selenium

Is there a way to make selenium work asynchronously?


One reason this script takes much time is because of the high number of requests it sends, One way for it to consume less time is to limit the number of requests. This could be achieved by using their internal API to get the profile data. For example: sending a get request to this link https://www.khanacademy.org/api/internal/user/discussion/summary?username=user_name&lang=en (and changing user_name to the actual username) will return the profile data you require (and more) as JSON instead of having to scrape many sources. You can then extract the data from the JSON output, and convert them to CSV. You will need to use selenium to only get the discussion data and find the list of usernames. This would cut script running time greatly.

Side note: even the modules links could be extracted using the JS variable that is parsed when scraping the main URL. The variable contains JSON that stores course data including links.

Here's the code that does that:

import requestsimport bs4import jsonURL = "https://www.khanacademy.org/computing/computer-programming/programming#intro-to-programming"BASE_URL = "https://www.khanacademy.org"response = requests.get(URL)soup = bs4.BeautifulSoup(response.content, 'lxml')script = soup.find_all('script')[18]script = script.text.encode('utf-8')script = unicode(script, errors='ignore').encode('utf-8').strip()script = script.split('{window["./javascript/app-shell-package/app-entry.js"] = ')[1]script = script[:-2]json_content = json.loads(script)

You can extract the modules links from that JSON, and query them instead.