Script Suddenly Stops Crawling Without Error or Exception

python selenium python-requests geckodriver urllib3

As per your 10^th revision of this question the error message...

HTTPConnectionPool(host='127.0.0.1', port=58992): Max retries exceeded with url: /session/e8beed9b-4faa-4e91-a659-56761cb604d7/element (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000022D31378A58>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

...implies that the get() method failed raising HTTPConnectionPool error with a message Max retries exceeded.

A couple of things:

As per the discussion max-retries-exceeded exceptions are confusing the traceback is somewhat misleading. Requests wraps the exception for the users convenience. The original exception is part of the message displayed.
Requests never retries (it sets the retries=0 for urllib3's HTTPConnectionPool), so the error would have been much more canonical without the MaxRetryError and HTTPConnectionPool keywords. So an ideal Traceback would have been:
```
  NewConnectionError(<class 'socket.error'>: [Errno 10061] No connection could be made because the target machine actively refused it)
```
You will find a detailed explaination in MaxRetryError: HTTPConnectionPool: Max retries exceeded (Caused by ProtocolError('Connection aborted.', error(111, 'Connection refused')))

Solution

As per the Release Notes of Selenium 3.14.1:

* Fix ability to set timeout for urllib3 (#6286)

The Merge is: repair urllib3 can't set timeout!

Conclusion

Once you upgrade to Selenium 3.14.1 you will be able to set the timeout and see canonical Tracebacks and would be able to take required action.

References

A couple of relevent references:

This usecase

I have taken your full script from codepen.io - A PEN BY Anthony. I had to make a few tweaks to your existing code as follows:

As you have used:
```
  ua_string = random.choice(ua_strings)
```

You have to mandatorily import random as:

    import random

You have created the variable next_button but haven't used it. I have clubbed up the following four lines:

  next_button = WebDriverWait(ff, 15).until(                  EC.text_to_be_present_in_element((By.PARTIAL_LINK_TEXT, 'Next→'), 'Next→')              )  ff.find_element(By.PARTIAL_LINK_TEXT, 'Next→').click()

As:

  WebDriverWait(ff, 15).until(EC.text_to_be_present_in_element((By.PARTIAL_LINK_TEXT, 'Next→'), 'Next→'))  ff.find_element(By.PARTIAL_LINK_TEXT, 'Next→').click()

Your modified code block will be:

  # -*- coding: utf-8 -*-  from selenium import webdriver  from selenium.webdriver.firefox.options import Options  from selenium.webdriver.common.by import By  from selenium.webdriver.support import expected_conditions as EC  from selenium.webdriver.support.ui import WebDriverWait  import time  import random  """ Set Global Variables  """  ua_strings = ['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36']  already_scraped_product_titles = []  """ Create Instances of WebDriver  """  def create_webdriver_instance():      ua_string = random.choice(ua_strings)      profile = webdriver.FirefoxProfile()      profile.set_preference('general.useragent.override', ua_string)      options = Options()      options.add_argument('--headless')      return webdriver.Firefox(profile)  """ Construct List of UA Strings  """  def fetch_ua_strings():      ff = create_webdriver_instance()      ff.get('https://techblog.willshouse.com/2012/01/03/most-common-user-agents/')      ua_strings_ff_eles = ff.find_elements_by_xpath('//td[@class="useragent"]')      for ua_string in ua_strings_ff_eles:          if 'mobile' not in ua_string.text and 'Trident' not in ua_string.text:              ua_strings.append(ua_string.text)      ff.quit()  """ Log in to Amazon to Use SiteStripe in order to Generate Affiliate Links  """  def log_in(ff):      ff.find_element(By.XPATH, '//a[@id="nav-link-yourAccount"] | //a[@id="nav-link-accountList"]').click()      ff.find_element(By.ID, 'ap_email').send_keys('anthony_falez@hotmail.com')      ff.find_element(By.ID, 'continue').click()      ff.find_element(By.ID, 'ap_password').send_keys('lo0kyLoOkYig0t4h')      ff.find_element(By.NAME, 'rememberMe').click()      ff.find_element(By.ID, 'signInSubmit').click()  """ Build Lists of Product Page URLs  """  def initiate_crawl():      def refresh_page(url):      ff = create_webdriver_instance()      ff.get(url)      ff.find_element(By.XPATH, '//*[@id="FilterItemView_sortOrder_dropdown"]/div/span[2]/span/span/span/span').click()      ff.find_element(By.XPATH, '//a[contains(text(), "Discount - High to Low")]').click()      items = WebDriverWait(ff, 15).until(          EC.visibility_of_all_elements_located((By.XPATH, '//div[contains(@id, "100_dealView_")]'))      )      for count, item in enumerate(items):          slashed_price = item.find_elements(By.XPATH, './/span[contains(@class, "a-text-strike")]')          active_deals = item.find_elements(By.XPATH, './/*[contains(text(), "Add to Cart")]')          # For Groups of Items on Sale          # active_deals = //*[contains(text(), "Add to Cart") or contains(text(), "View Deal")]          if len(slashed_price) > 0 and len(active_deals) > 0:              product_title = item.find_element(By.ID, 'dealTitle').text              if product_title not in already_scraped_product_titles:                  already_scraped_product_titles.append(product_title)                  url = ff.current_url                  # Scrape Details of Each Deal                  #extract(ff, item.find_element(By.ID, 'dealImage').get_attribute('href'))                  print(product_title[:10])                  ff.quit()                  refresh_page(url)                  break          if count+1 is len(items):              try:                  print('')                  print('new page')                  WebDriverWait(ff, 15).until(EC.text_to_be_present_in_element((By.PARTIAL_LINK_TEXT, 'Next→'), 'Next→'))                  ff.find_element(By.PARTIAL_LINK_TEXT, 'Next→').click()                  time.sleep(10)                  url = ff.current_url                  print(url)                  print('')                  ff.quit()                  refresh_page(url)              except Exception as error:                  """                  ff.find_element(By.XPATH, '//*[@id="pagination-both-004143081429407891"]/ul/li[9]/a').click()                  url = ff.current_url                  ff.quit()                  refresh_page(url)                  """                  print('cannot find ff.find_element(By.PARTIAL_LINK_TEXT, "Next?")')                  print('Because of... {}'.format(error))                  ff.quit()      refresh_page('https://www.amazon.ca/gp/goldbox/ref=gbps_ftr_s-3_4bc8_dct_10-?gb_f_c2xvdC0z=sortOrder:BY_SCORE,discountRanges:10-25%252C25-50%252C50-70%252C70-&pf_rd_p=f5836aee-0969-4c39-9720-4f0cacf64bc8&pf_rd_s=slot-3&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=A3DWYIK6Y9EEQB&pf_rd_r=CQ7KBNXT36G95190QJB1&ie=UTF8')  #def extract_info(ff, url):  fetch_ua_strings()  initiate_crawl()

Console Output: With Selenium v3.14.0 and Firefox Quantum v62.0.3, I can extract the following output on the console:

  J.Rosée Si  B.Catcher   Bluetooth4  FRAM G4164  Major Crim  20% off Oh  True Blood  Prime-Line  Marathon 3  True Blood  B.Catcher   4 Film Fav  True Blood  Texture Pa  Westinghou  True Blood  ThermoPro   ...  ...  ...

Note: I could have optimized your code and performed the same web scraping operations initializing the Firefox Browser Client only once and traverse through various products and their details. But to preserve your logic and innovation I have suggested the minimal changes required to get you through.

python selenium python-requests geckodriver urllib3

I slightly adjusted the code and it seems to work. Changes:

import random statement because it is used and would not run without it.

Inside product_title loop these lines are removed:

ff.quit(), refresh_page(url) and break

The ff.quit() statement would cause a fatal (connection) error causing the script to break.

Also is changed to == for if count + 1 == len(item):

# -*- coding: utf-8 -*-from selenium import webdriverfrom selenium.webdriver import Firefoxfrom selenium.webdriver.firefox.options import Optionsfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.support.ui import WebDriverWaitimport timeimport random""" Set Global Variables"""ua_strings = ['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36']already_scraped_product_titles = []""" Create Instances of WebDriver"""def create_webdriver_instance():    ua_string = random.choice(ua_strings)    profile = webdriver.FirefoxProfile()    profile.set_preference('general.useragent.override', ua_string)    options = Options()    options.add_argument('--headless')    return webdriver.Firefox(profile)""" Construct List of UA Strings"""def fetch_ua_strings():    ff = create_webdriver_instance()    ff.get('https://techblog.willshouse.com/2012/01/03/most-common-user-agents/')    ua_strings_ff_eles = ff.find_elements_by_xpath('//td[@class="useragent"]')    for ua_string in ua_strings_ff_eles:        if 'mobile' not in ua_string.text and 'Trident' not in ua_string.text:            ua_strings.append(ua_string.text)    ff.quit()""" Build Lists of Product Page URLs"""def initiate_crawl():    def refresh_page(url):        ff = create_webdriver_instance()        ff.get(url)        ff.find_element(By.XPATH, '//*[@id="FilterItemView_sortOrder_dropdown"]/div/span[2]/span/span/span/span').click()        ff.find_element(By.XPATH, '//a[contains(text(), "Discount - High to Low")]').click()        items = WebDriverWait(ff, 15).until(            EC.visibility_of_all_elements_located((By.XPATH, '//div[contains(@id, "100_dealView_")]'))        )        print(items)        for count, item in enumerate(items):            slashed_price = item.find_elements(By.XPATH, './/span[contains(@class, "a-text-strike")]')            active_deals = item.find_elements(By.XPATH, './/*[contains(text(), "Add to Cart")]')            # For Groups of Items on Sale            # active_deals = //*[contains(text(), "Add to Cart") or contains(text(), "View Deal")]            if len(slashed_price) > 0 and len(active_deals) > 0:                product_title = item.find_element(By.ID, 'dealTitle').text                if product_title not in already_scraped_product_titles:                    already_scraped_product_titles.append(product_title)                    url = ff.current_url                    # Scrape Details of Each Deal                    #extract(ff, item.find_element(By.ID, 'dealImage').get_attribute('href'))                    print(product_title[:10])                    # This ff.quit()-line breaks connection which breaks things.:                    #ff.quit()                    # And why                     #refresh_page(url)                    #break            # 'is' tests for object equality; == tests for value equality:            if count+1 == len(items):                try:                    print('')                    print('new page')                    next_button = WebDriverWait(ff, 15).until(                        EC.text_to_be_present_in_element((By.PARTIAL_LINK_TEXT, 'Next→'), 'Next→')                    )                    ff.find_element(By.PARTIAL_LINK_TEXT, 'Next→').click()                                        time.sleep(3)                    url = ff.current_url                    print(url)                    print('')                    ff.quit()                    refresh_page(url)                except Exception as error:                    """                    ff.find_element(By.XPATH, '//*[@id="pagination-both-004143081429407891"]/ul/li[9]/a').click()                    url = ff.current_url                    ff.quit()                    refresh_page(url)                    """                    print('cannot find ff.find_element(By.PARTIAL_LINK_TEXT, "Next→")')                    print('Because of... {}'.format(error))                    ff.quit()    refresh_page('https://www.amazon.ca/gp/goldbox/ref=gbps_ftr_s-3_4bc8_dct_10-?gb_f_c2xvdC0z=sortOrder:BY_SCORE,discountRanges:10-25%252C25-50%252C50-70%252C70-&pf_rd_p=f5836aee-0969-4c39-9720-4f0cacf64bc8&pf_rd_s=slot-3&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=A3DWYIK6Y9EEQB&pf_rd_r=CQ7KBNXT36G95190QJB1&ie=UTF8')#def extract_info(ff, url):fetch_ua_strings()initiate_crawl()

CodeHunter

Script Suddenly Stops Crawling Without Error or Exception

Solution

Conclusion

References

This usecase

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last