Capturing JSON data from intermediate events using Selenium Capturing JSON data from intermediate events using Selenium json json

Capturing JSON data from intermediate events using Selenium


You will need to use a proxy, my suggestion would be to use the BrowserMob Proxy.

First of all install the BrowserMob Proxy libraries:

pip install browsermob-proxy

You will then need to download the latest release (2.1.4 at the time of writing this), extract it and then place it in your project directory. This is going to be a location you need to pass in when setting up the BrowserMob Proxy server (See below where Server("browsermob-proxy-2.1.4/bin/browsermob-proxy") is defined)

I've then updated your script to the following:

import jsonfrom browsermobproxy import Serverfrom haralyzer import HarParserfrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.support.ui import WebDriverWaitbase_url = 'https://www.botoxcosmetic.com'server = Server("browsermob-proxy-2.1.4/bin/browsermob-proxy")server.start()proxy = server.create_proxy()chrome_options = webdriver.ChromeOptions()chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy))driver = webdriver.Chrome(options=chrome_options)driver.get("{0}/women/find-a-botox-cosmetic-specialist".format(base_url))proxy.new_har(options={"captureContent": "true"})driver.find_element_by_class_name('normalZip').send_keys('10022')driver.find_element_by_class_name('normalSearch').click()WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#specialist-results > div")))har_parser = HarParser(proxy.har)for entry in har_parser.har_data["entries"]:    if entry["request"]["url"] == "{0}/sc/api/findclinic/FindSpecialists".format(base_url):        result = json.loads(entry["response"]["content"]["text"])driver.quit()server.stop()

This will start up a BrowserMob Proxy instance and capture the response for the FindSpecialists network call and store it as JSON in the result variable.

You can then use that to do whatever you want to do with the response. Apologies if the code is not as clean as you would expect, I'm not a native Pythonista.

Useful references are: