How can I capture all network requests and full response data when loading a page in Chrome? How can I capture all network requests and full response data when loading a page in Chrome? google-chrome google-chrome

How can I capture all network requests and full response data when loading a page in Chrome?


You can enable a request interception with page.setRequestInterception() for each request, and then, inside page.on('request'), you can use the request-promise-native module to act as a middle man to gather the response data before continuing the request with request.continue() in Puppeteer.

Here's a full working example:

'use strict';const puppeteer = require('puppeteer');const request_client = require('request-promise-native');(async () => {  const browser = await puppeteer.launch();  const page = await browser.newPage();  const result = [];  await page.setRequestInterception(true);  page.on('request', request => {    request_client({      uri: request.url(),      resolveWithFullResponse: true,    }).then(response => {      const request_url = request.url();      const request_headers = request.headers();      const request_post_data = request.postData();      const response_headers = response.headers;      const response_size = response_headers['content-length'];      const response_body = response.body;      result.push({        request_url,        request_headers,        request_post_data,        response_headers,        response_size,        response_body,      });      console.log(result);      request.continue();    }).catch(error => {      console.error(error);      request.abort();    });  });  await page.goto('https://example.com/', {    waitUntil: 'networkidle0',  });  await browser.close();})();


Puppeteer-only solution

This can be done with puppeteer alone. The problem you are describing that the response.buffer is cleared on navigation, can be circumvented by processing each request one after another.

How it works

The code below uses page.setRequestInterception to intercept all requests. If there is currently a request being processed/being waited for, new requests are put into a queue. Then, response.buffer() can be used without the problem that other requests might asynchronously wipe the buffer as there are no parallel requests. As soon as the currently processed request/response is handled, the next request will be processed.

Code

const puppeteer = require('puppeteer');(async () => {    const browser = await puppeteer.launch();    const [page] = await browser.pages();    const results = []; // collects all results    let paused = false;    let pausedRequests = [];    const nextRequest = () => { // continue the next request or "unpause"        if (pausedRequests.length === 0) {            paused = false;        } else {            // continue first request in "queue"            (pausedRequests.shift())(); // calls the request.continue function        }    };    await page.setRequestInterception(true);    page.on('request', request => {        if (paused) {            pausedRequests.push(() => request.continue());        } else {            paused = true; // pause, as we are processing a request now            request.continue();        }    });    page.on('requestfinished', async (request) => {        const response = await request.response();        const responseHeaders = response.headers();        let responseBody;        if (request.redirectChain().length === 0) {            // body can only be access for non-redirect responses            responseBody = await response.buffer();        }        const information = {            url: request.url(),            requestHeaders: request.headers(),            requestPostData: request.postData(),            responseHeaders: responseHeaders,            responseSize: responseHeaders['content-length'],            responseBody,        };        results.push(information);        nextRequest(); // continue with next request    });    page.on('requestfailed', (request) => {        // handle failed request        nextRequest();    });    await page.goto('...', { waitUntil: 'networkidle0' });    console.log(results);    await browser.close();})();


I would suggest you to search for a quick proxy server which allows to write requests logs together with actual content.

The target setup is to allow proxy server to just write a log file, and then analyze the log, searching for information you need.

Don't intercept requests while proxy is working (this will lead to slow down)

The performance issues(with proxy as logger setup) you may encounter are mostly related to TLS support, please pay attention to allow quick TLS handshake, HTTP2 protocol in the proxy setup

E.g. Squid benchmarks show that it is able to process hundreds RPS, which should be enough for testing purposes