How to detect browser spoofing and robots from a user agent string in php

php user-agent bots spoof

In addition to filtering key words in the user agent string, I have had luck with putting a hidden honeypot link on all pages:

<a style="display:none" href="autocatch.php">A</a>

Then in "autocatch.php" record the session (or IP address) as a bot. This link is invisible to users but it's hidden characteristic would hopefully not be realized by bots. Taking the style attribute out and putting it into a CSS file might help even more.

php user-agent bots spoof

Because, as previously stated, you can spoof user-agents & IP, these cannot be used for reliable bot detection.

I work for a security company and our bot detection algorithm look something like this:

Step 1 - Gathering data:
a. Cross-Check user-agent vs IP. (both need to be right)
b. Check Header parameters (what is missing, what is the order and etc...)
c. Check behavior (early access and compliance to robots.txt, general behavior, number of pages visited, visit rates and etc)
Step 2 - Classification:
By cross verifying the data, the bot is classified as "Good", "Bad" or "Suspicious"
Step 3 - Active Challenges:
Suspicious bots undergo the following challenges:
a. JS Challenge (can it activate JS?)
b. Cookie Challenge (can it accept coockies?)
c. If still not conclusive -> CAPTCHA

This filtering mechanism is VERY effective but I don't really think it could be replicated by a single person or even an unspecialized provider (for one thing, challenges and bot DB needs to be constantly updated by security team).

We offer some sort of "do it yourself" tools in form of Botopedia.org, our directory that can be used for IP/User-name cross-verification, but for truly efficient solution you will have to rely on specialized services.

There are several free bot monitoring solutions, including our own and most will use the same strategy I've described above (or similar).

php user-agent bots spoof

Beyond just comparing user agents, you would keep a log of activity and look for robot behavior. Often times this will include checking for /robots.txt and not loading images. Another trick is to ask the client if they have javascript since most bots won't mark it as enabled.

However, beware, you may well accidently get some people who are genuinely people.

CodeHunter

How to detect browser spoofing and robots from a user agent string in php

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last