Scraping hidden text of hotels reviews

python selenium web-scraping beautifulsoup lxml

If selenium is not necessary then you can try to use requests with Beautifulsoup instead.

import requestsfrom bs4 import BeautifulSoupurl = 'https://www.yelp.com/biz/fairmont-san-francisco-san-francisco? sort_by=rating_desc'page = requests.get(url)soup = BeautifulSoup(page.text,'html.parser')reviews = soup.find_all('p',attrs={'lang':'en'})for review in reviews:    print(review.text)

for find all reviews from all pages please try

import requestsfrom bs4 import BeautifulSoupurl = 'https://www.yelp.com/biz/fairmont-san-francisco-san-francisco?sort_by=rating_desc'while url:    page = requests.get(url)    soup = BeautifulSoup(page.text,'html.parser')    reviews = soup.find_all('p',attrs={'lang':'en'})    for review in reviews:        print(review.text)    next_page = soup.find('a',{'class':'next'})    if next_page:        url = next_page['href']    else:        url = None

python selenium web-scraping beautifulsoup lxml

Seems works with BeautifulSoup, well i used selenium to get the page source...see the code

from selenium import webdriverfrom bs4 import BeautifulSoupu = 'https://www.yelp.com/biz/fairmont-san-francisco-san-francisco?sort_by=rating_desc'driver = webdriver.Chrome(executable_path = r'C:\chromedriver_win32\chromedriver.exe')#, options=options) driver.get(u)soup = BeautifulSoup(driver.page_source,'html.parser')reviews = soup.find_all('p',attrs={'lang':'en'})for review in reviews:    print(review.text)

python selenium web-scraping beautifulsoup lxml

Your XPath is not finding the element. If you print the length of the list it returns zero.

Try this,

p = driver.find_elements_by_xpath("//div[@class='review-list']/ul/li//p[@lang='en']")print(len(p))for i in p:    print(i.text)

You can test your XPath or CSS selector in the chrome dev tool.

CodeHunter

Scraping hidden text of hotels reviews

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last