Capturing JSON data from intermediate events using Selenium
You will need to use a proxy, my suggestion would be to use the BrowserMob Proxy.
First of all install the BrowserMob Proxy libraries:
pip install browsermob-proxy
You will then need to download the latest release (2.1.4 at the time of writing this), extract it and then place it in your project directory. This is going to be a location you need to pass in when setting up the BrowserMob Proxy server (See below where Server("browsermob-proxy-2.1.4/bin/browsermob-proxy")
is defined)
I've then updated your script to the following:
import jsonfrom browsermobproxy import Serverfrom haralyzer import HarParserfrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.support.ui import WebDriverWaitbase_url = 'https://www.botoxcosmetic.com'server = Server("browsermob-proxy-2.1.4/bin/browsermob-proxy")server.start()proxy = server.create_proxy()chrome_options = webdriver.ChromeOptions()chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy))driver = webdriver.Chrome(options=chrome_options)driver.get("{0}/women/find-a-botox-cosmetic-specialist".format(base_url))proxy.new_har(options={"captureContent": "true"})driver.find_element_by_class_name('normalZip').send_keys('10022')driver.find_element_by_class_name('normalSearch').click()WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#specialist-results > div")))har_parser = HarParser(proxy.har)for entry in har_parser.har_data["entries"]: if entry["request"]["url"] == "{0}/sc/api/findclinic/FindSpecialists".format(base_url): result = json.loads(entry["response"]["content"]["text"])driver.quit()server.stop()
This will start up a BrowserMob Proxy instance and capture the response for the FindSpecialists
network call and store it as JSON in the result variable.
You can then use that to do whatever you want to do with the response. Apologies if the code is not as clean as you would expect, I'm not a native Pythonista.
Useful references are:
- The BrowserMob Proxy website
- The BroswerMob proxy source code on Github
- The Python documentation for BrowserMob Proxy
- The haralyser website
- The ChromeDriver website