Scraping elements rendered using React JS with BeautifulSoup Scraping elements rendered using React JS with BeautifulSoup selenium selenium

Scraping elements rendered using React JS with BeautifulSoup


There is no problem with your code but the website you are scraping - it does not stop loading for some reason that prevents the parsing of the page and subsequent code you wrote.

I tried with wikipedia to confirm the same:

from bs4 import BeautifulSoupfrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as EClistUrls = ["https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"]# browser = webdriver.PhantomJS('/usr/local/bin/phantomjs')browser = webdriver.Chrome("./chromedriver")urls=[]for url in listUrls:    browser.get(url)    soup = BeautifulSoup(browser.page_source,"html.parser")    results = soup.findAll('a',{'class':"mw-redirect"})    for result in results:        link = result["href"]        urls.append(link)    print urls

Outputs:

[u'/wiki/List_of_states_and_territories_of_India_by_area', u'/wiki/List_of_Indian_states_by_GDP_per_capita', u'/wiki/Constitutional_republic', u'/wiki/States_and_territories_of_India', u'/wiki/National_Capital_Territory_of_Delhi', u'/wiki/States_Reorganisation_Act', u'/wiki/High_Courts_of_India', u'/wiki/Delhi_NCT', u'/wiki/Bengaluru', u'/wiki/Madras', u'/wiki/Andhra_Pradesh_Capital_City', u'/wiki/States_and_territories_of_India', u'/wiki/Jammu_(city)']

P.S. I'm using a chrome driver in order to run the script against the real chrome browser for debugging purposes. Download the chrome driver from https://chromedriver.storage.googleapis.com/index.html?path=2.27/


Selenium will render the page including the Javascript. Your code is working properly. It is waiting for the element to be generated. In your case, Selenium didn't get that CSS element. The URL which you gave is not rendering the result page. Instead of that, It is generating the following error page.

http://imgur.com/a/YwFyE

This page is not having the CSS class. Your code is waiting for that particular CSS element. Try Firefox web driver to see what is happening.