Easy way to test a URL for 404 in PHP? Easy way to test a URL for 404 in PHP? php php

Easy way to test a URL for 404 in PHP?


If you are using PHP's curl bindings, you can check the error code using curl_getinfo as such:

$handle = curl_init($url);curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);/* Get the HTML or whatever is linked in $url. */$response = curl_exec($handle);/* Check for 404 (file not found). */$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);if($httpCode == 404) {    /* Handle 404 here. */}curl_close($handle);/* Handle $response here. */


If your running php5 you can use:

$url = 'http://www.example.com';print_r(get_headers($url, 1));

Alternatively with php4 a user has contributed the following:

/**This is a modified version of code from "stuart at sixletterwords dot com", at 14-Sep-2005 04:52. This version tries to emulate get_headers() function at PHP4. I think it works fairly well, and is simple. It is not the best emulation available, but it works.Features:- supports (and requires) full URLs.- supports changing of default port in URL.- stops downloading from socket as soon as end-of-headers is detected.Limitations:- only gets the root URL (see line with "GET / HTTP/1.1").- don't support HTTPS (nor the default HTTPS port).*/if(!function_exists('get_headers')){    function get_headers($url,$format=0)    {        $url=parse_url($url);        $end = "\r\n\r\n";        $fp = fsockopen($url['host'], (empty($url['port'])?80:$url['port']), $errno, $errstr, 30);        if ($fp)        {            $out  = "GET / HTTP/1.1\r\n";            $out .= "Host: ".$url['host']."\r\n";            $out .= "Connection: Close\r\n\r\n";            $var  = '';            fwrite($fp, $out);            while (!feof($fp))            {                $var.=fgets($fp, 1280);                if(strpos($var,$end))                    break;            }            fclose($fp);            $var=preg_replace("/\r\n\r\n.*\$/",'',$var);            $var=explode("\r\n",$var);            if($format)            {                foreach($var as $i)                {                    if(preg_match('/^([a-zA-Z -]+): +(.*)$/',$i,$parts))                        $v[$parts[1]]=$parts[2];                }                return $v;            }            else                return $var;        }    }}

Both would have a result similar to:

Array(    [0] => HTTP/1.1 200 OK    [Date] => Sat, 29 May 2004 12:28:14 GMT    [Server] => Apache/1.3.27 (Unix)  (Red-Hat/Linux)    [Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT    [ETag] => "3f80f-1b6-3e1cb03b"    [Accept-Ranges] => bytes    [Content-Length] => 438    [Connection] => close    [Content-Type] => text/html)

Therefore you could just check to see that the header response was OK eg:

$headers = get_headers($url, 1);if ($headers[0] == 'HTTP/1.1 200 OK') {//valid }if ($headers[0] == 'HTTP/1.1 301 Moved Permanently') {//moved or redirect page}

W3C Codes and Definitions


With strager's code, you can also check the CURLINFO_HTTP_CODE for other codes. Some websites do not report a 404, rather they simply redirect to a custom 404 page and return 302 (redirect) or something similar. I used this to check if an actual file (eg. robots.txt) existed on the server or not. Clearly this kind of file would not cause a redirect if it existed, but if it didn't it would redirect to a 404 page, which as I said before may not have a 404 code.

function is_404($url) {    $handle = curl_init($url);    curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);    /* Get the HTML or whatever is linked in $url. */    $response = curl_exec($handle);    /* Check for 404 (file not found). */    $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);    curl_close($handle);    /* If the document has loaded successfully without any redirection or error */    if ($httpCode >= 200 && $httpCode < 300) {        return false;    } else {        return true;    }}