PHP Goutte / CURL - Complete ASPX Form
well, I'm not familiar with goutte, but using this package w3zone/crawler I've made a quick example to scrap the content of that link:
install it using:
composer require w3zone/Crawler
then use it for your case as follows:
require_once __DIR__ . '/vendor/autoload.php';use w3zone\Crawler\{Crawler, Services\phpCurl};$crawler = new Crawler(new phpCurl);$link = 'https://wyobiz.wy.gov/Business/FilingSearch.aspx';$homePage = $crawler->get($link)->run();preg_match('#<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)"\s*/>#', $homePage['body'], $viewState);preg_match('#<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="(.*?)"\s*/>#', $homePage['body'], $viewGen);preg_match('#<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)"\s*/>#', $homePage['body' ], $eventVal);$postData = array( '__VIEWSTATE' => $viewState[1], '__LASTFOCUS' => '', '__EVENTTARGET' => '', '__EVENTARGUMENT' => '', '__VIEWSTATEGENERATOR' => $viewGen[1], '__EVENTVALIDATION' => $eventVal[1], 'ctl00$MainContent$myScriptManager' => 'MainContent_myScriptManager', 'ctl00$MainContent$txtFilingName' => 'test', 'ctl00$MainContent$searchOpt' => 'chkSearchStartWith', 'ctl00$MainContent$txtFilingID' => '', 'ctl00$MainContent$cmdSearch' => 'Search', '__ASYNCPOST' => 'true', 'ctl00$MainContent$myScriptManager' => 'ctl00$MainContent$UpdatePanel1|ctl00$MainContent$cmdSearch',);$response = $crawler->post(['url' => $link, 'data' => $postData])->dumpHeaders()->run();echo "<textarea style='width: 90%; height: 200px;'>".$response['body']."</textarea>";
The problem for me was the ASP asynchronous response isn't HTML - it's text with HTML in it:
<html>1|#||4|6079|updatePanel|ctl00_MainContentPlaceHolder_ucLicenseLookup_UpdtPanelGridLookup| <div class="modal-window-lookup-results fade bs-example-modal-lg in"> <div class="modal-header"> [...]</html>
So, when goutte feeds it to browser-kit, it breaks. Goutte doesn't suck - you just can't feed it non-HTML garbage.
To get around this in a hurry I just did:
$crawler = $client->request('POST', $url, $params);// this is a broken crawler because response is not html!$html = $client->getResponse()->getContent();$html = substr($html, strpos($html, "<div"));$html = substr($html, 0, strpos($html, "|hiddenField|")-3);$html = "<!DOCTYPE html><html>$html</html>";$crawler = new \Symfony\Component\DomCrawler\Crawler($html);print $crawler->html();