The CURL User Agent
You can fake the user-agent when using cURL, so it's pointless depending on the user-agent sent when you KNOW it's a cURL request.
For example: I recently wrote an app which gets the pagerank of a url from google. Now Google doesn't like this, so it allows only a certain user agent to access its pagerank servers. Solution? Spoof the user-agent using cURL and Google will be none the wiser.
Moral of the story: cURL user agents are JUST NOT reliable.
If you still want to do this, then you should be able to get the passed user agent just like normal
$userAgent=$_SERVER['HTTP_USER_AGENT'];
EDIT A quick test proved this:
dumpx.php:
<?php $url="http://localhost/dump.php"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); if($_GET['u']==y) { curl_setopt($ch, CURLOPT_USERAGENT, "booyah!"); } curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 2); curl_setopt($ch, CURLOPT_TIMEOUT, 60); //curl_setopt($ch, CURLOPT_CUSTOMREQUEST,'GET'); curl_setopt ($ch, CURLOPT_HEADER, 0); $exec=curl_exec ($ch);?>
dump.php:
<?php var_dump($_SERVER);?>
Case 1: http://localhost/dumpx.php?u=y
'HTTP_USER_AGENT' => string 'booyah!' (length=7)
Case 2: http://localhost/dumpx.php?u=n
No $_SERVER['HTTP_USER_AGENT']
This proves that there is no default user agent for curl: it will just not pass it in the request header
If you want to detect bots you can not rely on user agent. Best practices are:
- Check, that your visitor runs js (not all human users also do).
- Check, that your visitor loads additional files linked to webpage (css, images, etc.)
- Check visitor timeouts. Humans usualy don't load 10 pages per second.
cURL stands for - Client URL Library and the whole point of it is to be able to make requests that are identical to what a client would make.
The only thing you can do is detect the information that is part of the request, such as the IP address, HTTP Request Headers, cookies/session id cookie, URL (path/page), and any post/get data. If the person using curl to make the request is doing it from an expected IP address and is supplying any expected header/cookie/token/URL/post/get values, then you would not be able to distinguish a curl request from a browser making the request.