Scraping a Page Generated by Search/Authentification Scraping a Page Generated by Search/Authentification selenium selenium

Scraping a Page Generated by Search/Authentification


You can use Selenium via python to open the Query page, find and focus on the search box, enter some input (the grant ID in your case) using send_keys, and then click on the Search button with click() (or otherwise trigger the HTML form's SUBMIT action using submit()). Then, Selenium will take you to the results page in just the same manner as a normal browser would, even if the GET request parameters are being generated dynamically somehow, whether it's using JavaScript, session variables on the server-side with a cookie ID number, etc. You will end up with your results page's HTML in the page_source variable, which you can scrape with a regular expression or BeautifulSoup (and if the results page happens to be generated on the fly by something like JavaScript, you can again use Selenium to find what you want in the generated page).

Elements within the page, like the search box, can be picked out using a variety of identification methods -- if it has a unique "name" or "ID" attribute in the HTML, that is usually easiest (otherwise, try an XPATH query or CSS selector). Since you only posted a screenshot of the page, we can't look at the source code to tell exactly what will work.

If you want to take a crack at the code and post a snippet, people can comment on it. In the meantime, here are a couple of tutorials on this general technique that can almost certainly be adapted to scrape your site. You will need to set-up python with selenium and a webdriver (e.g., Chromedriver) if you don't already have it. This can be run with a GUI (a browser window will pop up on your screen and you'll see the form being filled out by Python) or you can run it headless (hidden).

https://www.scrapingbee.com/blog/selenium-python/

https://www.tutorialspoint.com/what-are-the-ways-of-submitting-a-form-in-selenium-with-python