Retrieving website HTML page using cURL with current session and cookie data on secured page Retrieving website HTML page using cURL with current session and cookie data on secured page curl curl

Retrieving website HTML page using cURL with current session and cookie data on secured page


Ok it is solved

After a lot of research.

Cookie data is passed but that does not make it session data..This was fixed using the following method:

private function Cookie2Session($name){    if (filter_input(INPUT_COOKIE, $name))    {        $_SESSION[$name] = filter_input(INPUT_COOKIE, $name);    }}// following lines put within the BORDER_PATROL Methodif (filter_input(INPUT_COOKIE, 'pdfCurl')){    $this->Cookie2Session('user_id');    $this->Cookie2Session('username');    $this->Cookie2Session('login_string');    $this->Cookie2Session('REMOTE_ADDR');    $this->Cookie2Session('HTTP_USER_AGENT');    $_SESSION['new_session'] = "true";}

Small alteration to the method _login_check()

// Login Check if user is logged in correctlyprivate function _login_check(){    // Database variables    $db_accounts = mysqli_connect($this->mySQL_accounts_host, $this->mySQL_accounts_username, $this->mySQL_accounts_password, $this->mySQL_accounts_database);    // Check if all session variables are set    if (isset($_SESSION['user_id'], $_SESSION['username'], $_SESSION['login_string']))    {        $user_id = $_SESSION['user_id'];        $login_string = $_SESSION['login_string'];        $username = $_SESSION['username'];        $ip_address = $_SERVER['REMOTE_ADDR']; // Get the IP address of the user.        $user_browser = $_SERVER['HTTP_USER_AGENT']; // Get the user-agent string of the user.// =====>> add this code, because cURL req comes from server. <<=====        if (isset($_SESSION["REMOTE_ADDR"]) && ($_SERVER['REMOTE_ADDR'] == $_SERVER['SERVER_ADDR']))        {            $ip_address = $_SESSION["REMOTE_ADDR"];        }// {rest of code}

Small updates to the getPHP.php file:

<?php// Requiresrequire_once 'assets/class.FirePDF.php';require_once 'assets/class.Firewizz.Security.php';$SECURITY = new \Firewizz\Security();$SECURITY->Start_Secure_Session();// Html file to scrape, if this works replace with referer so the page that does the request gets printed.(prepend by security so it can only be done from securePlay$html_file = 'http://www.secureplay.nl/?p=overzichten&sort=SpeelplaatsInspecties&s=67';// Output pdf filename$pdf_fileName = 'Test_Pdf.pdf';/* * cURL part */// create curl resource$ch = curl_init();// set source urlcurl_setopt($ch, CURLOPT_URL, $html_file);// set cookies$cookiesIn = "user_id=" . $_SESSION['user_id'] . "; username=" . $_SESSION['username'] . "; login_string=" . $_SESSION['login_string'] . "; pdfCurl=true; REMOTE_ADDR=" . $_SERVER['REMOTE_ADDR'] . "; HTTP_USER_AGENT=" . $_SERVER['HTTP_USER_AGENT'];$agent = $_SERVER['HTTP_USER_AGENT'];// set cURL Options$tmp = tempnam("/tmp", "CURLCOOKIE");if ($tmp === FALSE){    die('Could not generate a temporary cookie jar.');}$options = array(    CURLOPT_RETURNTRANSFER => true, // return web page    //CURLOPT_HEADER => true, //return headers in addition to content    CURLOPT_ENCODING => "", // handle all encodings    CURLOPT_AUTOREFERER => true, // set referer on redirect    CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect    CURLOPT_TIMEOUT => 120, // timeout on response    CURLOPT_MAXREDIRS => 10, // stop after 10 redirects    CURLINFO_HEADER_OUT => true,    CURLOPT_SSL_VERIFYPEER => false, // Disabled SSL Cert checks    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,    CURLOPT_COOKIEJAR => $tmp,    //CURLOPT_COOKIEFILE => $tmp,    CURLOPT_COOKIE => $cookiesIn,    CURLOPT_USERAGENT => $agent);// $output contains the output stringcurl_setopt_array($ch, $options);$output = curl_exec($ch);// close curl resource to free up system resourcescurl_close($ch);// output the cURLecho $output;?>

With the above knowledge you can totally use cURL to visit a secure page with current session data with only minor consessions in your security.


So your $cookiesIn needs to have your cookies defined. I'll make an example based on your code snippets:

$cookiesIn = "user_id=" . $_SESSION['user_id'] . "; username=" . $_SESSION['username'] . "; login_string=" . $_SESSION['login_string'] . ";";

Try setting that in your pdfCreator page. Replace $cookiesIn = ""; with the line above and see if that gives you a different result.

Also, here's a great reference for the cURL option cookie:

https://curl.haxx.se/libcurl/c/CURLOPT_COOKIE.html

If you want all cookies to just be sent instead of designating them, use this code:

$tmp = tempnam("/tmp", "CURLCOOKIE");if($tmp === FALSE) die('Could not generate a temporary cookie jar.');$options = array(    CURLOPT_RETURNTRANSFER => true, // return web page    //CURLOPT_HEADER => true, //return headers in addition to content    CURLOPT_ENCODING => "", // handle all encodings    CURLOPT_AUTOREFERER => true, // set referer on redirect    CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect    CURLOPT_TIMEOUT => 120, // timeout on response    CURLOPT_MAXREDIRS => 10, // stop after 10 redirects    CURLINFO_HEADER_OUT => true,    CURLOPT_SSL_VERIFYPEER => false, // Disabled SSL Cert checks    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,    CURLOPT_COOKIEJAR => $tmp,    CURLOPT_COOKIEFILE => $tmp,);

This code will dump all current know cookies for use in cURL, with the COOKIEJAR option. Then, when we designate COOKIEFILE, we're designating where cURL should look to include cookies with the request.

That said, I've gotten rid of the $cookiesIn reference as it should not be needed if you use the code above.


In this case, provided the session control algorithm is sound, you simply want to change the format the page is sent in.

Using cURL to re-fetch the page is one way to do this, but it looks like a XY problem; you don't actually want to use cURL, you want to control the output format, either HTML or PDF.

One viable option would be to reload the page after adding a specific parameter which will be injected in the page context and modify the output function. For example you could wrap the whole page in a output buffering bubble:

// Security checks as usual, then:if (array_key_exists('output', $_GET)) {    $format = $_GET['output']; // e.g. "pdf"    // We could check whether the response handler has a printAs<FORMAT> method    switch ($format) {        case 'pdf': $outputFn = 'printAsPDF'; break;        default:            throw new \Exception("Output in {$format} format not supported");    }    ob_start($output);}// Page is generated normally

The 'printAsPDF' output will receive the page contents, and would use something like dompdf or wkhtml2pdf to format it as a PDF file, add the appropriate Content-Type headers, and return the formatted PDF.

Security stays the same, and the modification can actually be implemented at the request decoding stage. A state variable with the currently used output format may be made accessible to other objects, which empowers them to behave differently depending on the situation (for example, a generateMenu() function might choose to immediately return instead of showing something which wouldn't make sense in a PDF).