Python 3, filling out a form with request (library) returns same page HTML without inputting parameters
The first observation is that you seem to be using a get
request in your example code instead of a post
request.
<form action="taa_search.cfm" method="post" ...> ^^^^^^^^^^^^^
After changing to a post
request, I was still getting the same results as you though (html from the main search form page). After a bit of experimentation, I seem to be able to get the proper html results by adding a referer
to the header.
Here is the code (I only commented out the writing to file part for example purposes):
import requestsBASE_URL = 'https://www.doleta.gov/tradeact/taa'def get_case_decision(case_number): headers = { 'referer': '{}/taa_search_form.cfm'.format(BASE_URL) } payload = { 'form_name': 'number_search', 'input': case_number } r = requests.post( '{}/taa_search.cfm'.format(BASE_URL), data=payload, headers=headers ) r.raise_for_status() return r.text # with open('requests_results_{}.html'.format(case_number), 'wb') as f: # f.write(r.content)
Testing:
>>> result = get_case_decision(10000)>>> 'MODINE MFG. COMPANY' in resultTrue>>> '9/12/1980' in resultTrue>>> result = get_case_decision(10001)>>> 'MUSKIN CORPORATION' in resultTrue>>> '2/27/1981' in resultTrue
Since you mentioned that you need to perform this ~10,000 times, you will probably want to look into using requests.Session
as well.