Scraping hidden text of hotels reviews Scraping hidden text of hotels reviews selenium selenium

Scraping hidden text of hotels reviews


If selenium is not necessary then you can try to use requests with Beautifulsoup instead.

import requestsfrom bs4 import BeautifulSoupurl = 'https://www.yelp.com/biz/fairmont-san-francisco-san-francisco? sort_by=rating_desc'page = requests.get(url)soup = BeautifulSoup(page.text,'html.parser')reviews = soup.find_all('p',attrs={'lang':'en'})for review in reviews:    print(review.text)

for find all reviews from all pages please try

import requestsfrom bs4 import BeautifulSoupurl = 'https://www.yelp.com/biz/fairmont-san-francisco-san-francisco?sort_by=rating_desc'while url:    page = requests.get(url)    soup = BeautifulSoup(page.text,'html.parser')    reviews = soup.find_all('p',attrs={'lang':'en'})    for review in reviews:        print(review.text)    next_page = soup.find('a',{'class':'next'})    if next_page:        url = next_page['href']    else:        url = None


Seems works with BeautifulSoup, well i used selenium to get the page source...see the code

from selenium import webdriverfrom bs4 import BeautifulSoupu = 'https://www.yelp.com/biz/fairmont-san-francisco-san-francisco?sort_by=rating_desc'driver = webdriver.Chrome(executable_path = r'C:\chromedriver_win32\chromedriver.exe')#, options=options) driver.get(u)soup = BeautifulSoup(driver.page_source,'html.parser')reviews = soup.find_all('p',attrs={'lang':'en'})for review in reviews:    print(review.text)


Your XPath is not finding the element. If you print the length of the list it returns zero.

Try this,

p = driver.find_elements_by_xpath("//div[@class='review-list']/ul/li//p[@lang='en']")print(len(p))for i in p:    print(i.text)

You can test your XPath or CSS selector in the chrome dev tool.

enter image description here