How to scan a page using curl after the browser interpretation
You could try something like ScrapeGoat that will actually render the website in a browser and injects javascript that you use to respond back with.
There is some jsfiddle example here
var fd = { url: prompt("url", "http://"), inject: ` var body = document.querySelector("body").innerText; response.send(body); `}fetch("https://scrapegoat.p.mashape.com/", { method: 'POST', body: JSON.stringify(fd), headers: { "Content-Type": "application/json", "X-Mashape-Key": "dFYPWXxpp3mshKD6Kimb4pVfvYLvp1YWcWfjsnErOY3HN8zs4a" }}).then(res => res.text()).then(text => { document.body.style.whiteSpace = "pre" document.body.innerText = text})
You should probably get your own mashape key
it would look something like this
curl -X POST --include 'https://scrapegoat.p.mashape.com/' \ -H 'X-Mashape-Key: <required>' \ -H 'Content-Type: application/json' \ --data-binary '{"url":"http://example.com","inject":"response.send(document.querySelector("body").innerText);"}'
The curl request does not have a notion that it should wait for scripts on the page to execute. However, you could use a headless browser (i.e. phantomjs) to achieve your goal. With a headless browser, you have access to the D.O.M. and other properties of a real web browser, enabling you to get the data you want at any point in the life cycle of the page.