How to use xpath in chrome headless+puppeteer evaluate()?
$x()
is not a standard JavaScript method to select element by XPath. $x()
it's only a helper in chrome devtools. They claim this in the documentation:
Note: This API is only available from within the console itself. You cannot access the Command Line API from scripts on the page.
And page.evaluate()
is treated here as a "scripts on the page".
You have two options:
Here is a example of selecting element (featured article) inside page.evaluate()
:
const puppeteer = require('puppeteer');(async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://en.wikipedia.org', { waitUntil: 'networkidle2' }); const text = await page.evaluate(() => { // $x() is not a JS standard - // this is only sugar syntax in chrome devtools // use document.evaluate() const featureArticle = document .evaluate( '//*[@id="mp-tfa"]', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null ) .singleNodeValue; return featureArticle.textContent; }); console.log(text); await browser.close();})();
- Select element by Puppeteer
page.$x()
and pass it topage.evaluate()
This example achieves the same results as in the 1. example:
const puppeteer = require('puppeteer');(async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://en.wikipedia.org', { waitUntil: 'networkidle2' }); // await page.$x() returns array of ElementHandle // we are only interested in the first element const featureArticle = (await page.$x('//*[@id="mp-tfa"]'))[0]; // the same as: // const featureArticle = await page.$('#mp-tfa'); const text = await page.evaluate(el => { // do what you want with featureArticle in page.evaluate return el.textContent; }, featureArticle); console.log(text); await browser.close();})();
Here is a related question how to inject $x()
helper function to your scripts.
If you insist on using page.$x()
, you can simply pass the result to page.evaluate()
:
const example = await page.evaluate(element => { return element.textContent;}, (await page.$x('//*[@id="result"]'))[0]);