Dealing with pagination when using scrapy-selenium (POST request)
Please see below the code that answers the above question.
In a nutshell, I changed the structure of the code and it now works perfectly. Some remarks:
- Firstly, save all the content of the pages in a list
- It is important to use the "except NoSuchElementException" at the end of the while-try loop --> Before adding this, the code kept failing as it did not know what to do once the last page was reached.
- Access the content of the stored links (responses).
All in all, I think structuring your Scrapy code this way works well when integrating Selenium with Scrapy. However, as I am a beginner with Web Scraping, any additional feedback on how to integrate Selenium with Scrapy efficiently will be appreciated.
# -*- coding: utf-8 -*-import scrapyfrom scrapy import Selectorfrom scrapy_selenium import SeleniumRequestfrom selenium.common.exceptions import NoSuchElementExceptionclass WinesSpider(scrapy.Spider): name = 'wines' responses = [] def start_requests(self): yield SeleniumRequest( url='https://www.getwines.com/category_Wine', callback=self.parse ) def parse(self, response): driver = response.meta['driver'] intial_page = driver.page_source self.responses.append(intial_page) found = True while found: try: next_page = driver.find_element_by_xpath("//b[text()= '>>']/parent::a") href = next_page.get_attribute('href') driver.execute_script(href) driver.implicitly_wait(2) self.responses.append(driver.page_source) except NoSuchElementException: break for resp in self.responses: r = Selector(text=resp) products = r.xpath("(//div[@class='layMain']//tbody)[5]/tr") for product in products: yield { 'product_name': product.xpath(".//a[@class='Srch-producttitle']/text()").get(), 'product_link': product.xpath(".//a[@class='Srch-producttitle']/@href").get(), 'product_actual_price': product.xpath(".//span[@class='RegularPrice']/text()").get(), 'product_price_onsale': product.xpath(".//td//td[3]//td/span[4]/text()").get() }