Easy way to test a URL for 404 in PHP?
If you are using PHP's curl
bindings, you can check the error code using curl_getinfo
as such:
$handle = curl_init($url);curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);/* Get the HTML or whatever is linked in $url. */$response = curl_exec($handle);/* Check for 404 (file not found). */$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);if($httpCode == 404) { /* Handle 404 here. */}curl_close($handle);/* Handle $response here. */
If your running php5 you can use:
$url = 'http://www.example.com';print_r(get_headers($url, 1));
Alternatively with php4 a user has contributed the following:
/**This is a modified version of code from "stuart at sixletterwords dot com", at 14-Sep-2005 04:52. This version tries to emulate get_headers() function at PHP4. I think it works fairly well, and is simple. It is not the best emulation available, but it works.Features:- supports (and requires) full URLs.- supports changing of default port in URL.- stops downloading from socket as soon as end-of-headers is detected.Limitations:- only gets the root URL (see line with "GET / HTTP/1.1").- don't support HTTPS (nor the default HTTPS port).*/if(!function_exists('get_headers')){ function get_headers($url,$format=0) { $url=parse_url($url); $end = "\r\n\r\n"; $fp = fsockopen($url['host'], (empty($url['port'])?80:$url['port']), $errno, $errstr, 30); if ($fp) { $out = "GET / HTTP/1.1\r\n"; $out .= "Host: ".$url['host']."\r\n"; $out .= "Connection: Close\r\n\r\n"; $var = ''; fwrite($fp, $out); while (!feof($fp)) { $var.=fgets($fp, 1280); if(strpos($var,$end)) break; } fclose($fp); $var=preg_replace("/\r\n\r\n.*\$/",'',$var); $var=explode("\r\n",$var); if($format) { foreach($var as $i) { if(preg_match('/^([a-zA-Z -]+): +(.*)$/',$i,$parts)) $v[$parts[1]]=$parts[2]; } return $v; } else return $var; } }}
Both would have a result similar to:
Array( [0] => HTTP/1.1 200 OK [Date] => Sat, 29 May 2004 12:28:14 GMT [Server] => Apache/1.3.27 (Unix) (Red-Hat/Linux) [Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT [ETag] => "3f80f-1b6-3e1cb03b" [Accept-Ranges] => bytes [Content-Length] => 438 [Connection] => close [Content-Type] => text/html)
Therefore you could just check to see that the header response was OK eg:
$headers = get_headers($url, 1);if ($headers[0] == 'HTTP/1.1 200 OK') {//valid }if ($headers[0] == 'HTTP/1.1 301 Moved Permanently') {//moved or redirect page}
With strager's code, you can also check the CURLINFO_HTTP_CODE for other codes. Some websites do not report a 404, rather they simply redirect to a custom 404 page and return 302 (redirect) or something similar. I used this to check if an actual file (eg. robots.txt) existed on the server or not. Clearly this kind of file would not cause a redirect if it existed, but if it didn't it would redirect to a 404 page, which as I said before may not have a 404 code.
function is_404($url) { $handle = curl_init($url); curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE); /* Get the HTML or whatever is linked in $url. */ $response = curl_exec($handle); /* Check for 404 (file not found). */ $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE); curl_close($handle); /* If the document has loaded successfully without any redirection or error */ if ($httpCode >= 200 && $httpCode < 300) { return false; } else { return true; }}