Sanitizing user's data in GET by PHP Sanitizing user's data in GET by PHP php php

Sanitizing user's data in GET by PHP


How do you sanitize data in $_GET -variables by PHP?

You do not sanitize data in $_GET. This is a common approach in PHP scripts, but it's completely wrong*.

All your variables should stay in plain text form until the point when you embed them in another type of string. There is no one form of escaping or ‘sanitization’ that can cover all possible types of string you might be embedding your values into.

So if you're embedding a string into an SQL query, you need to escape it on the way out:

$sql= "SELECT * FROM accounts WHERE username='".pg_escape_string($_GET['username'])."'";

And if you're spitting the string out into HTML, you need to escape it then:

Cannot log in as <?php echo(htmlspecialchars($_GET['username'], ENT_QUOTES)) ?>.

If you did both of these escaping steps on the $_GET array at the start, as recommended by people who don't know what they're doing:

$_GET['username']= htmlspecialchars(pg_escape_string($_GET['username']));

Then when you had a ‘&’ in your username, it would mysteriously turn into ‘&’ in your database, and if you had an apostrophe in your username, it would turn into two apostrophes on the page. Then when you have a form with these characters in it is easy to end up double-escaping things when they're edited, which is why so many bad PHP CMSs end up with broken article titles like “New books from O\\\\\\\\\\\\\\\\\\\'Reilly”.

Naturally, remembering to pg_escape_string or mysql_real_escape_string, and htmlspecialchars every time you send a variable out is a bit tedious, which is why everyone wants to do it (incorrectly) in one place at the start of the script. For HTML output, you can at least save some typing by defining a function with a short name that does echo(htmlspecialchars(...)).

For SQL, you're better off using parameterised queries. For Postgres there's pg_query_params. Or indeed, prepared statements as you mentioned (though I personally find them less managable). Either way, you can then forget about ‘sanitizing’ or escaping for SQL, but you must still escape if you embed in other types of string including HTML.

strip_tags() is not a good way of treating input for HTML display. In the past it has had security problems, as browser parsers are actually much more complicated in their interpretation of what a tag can be than you might think. htmlspecialchars() is almost always the right thing to use instead, so that if someone types a less-than sign they'll actually get a literal less-than sign and not find half their text mysteriously vanishing.

(*: as a general approach to solving injection problems, anyway. Naturally there are domain-specific checks it is worth doing on particular fields, and there are useful cleanup tasks you can do like removing all control characters from submitted values. But this is not what most PHP coders mean by sanitization.)


If you're talking about sanitizing output, I would recommend storing content in your database in it's full, unescaped form, and then escaping it (htmlspecialchars or something) when you are echoing out the data, that way you have more options for outputting. See this question for a discussion of sanitising/escaping database content.

In terms of storing in postgres, use pg_escape_string on each variable in the query, to escape quotes, and generally protect against SQL injection.

Edit:

My usual steps for storing data in a database, and then retrieving it, are:

  1. Call the database data escaping function (pg_escape_string, mysql_escape_string, etc), to escape each incoming $_GET variable used in your query. Note that using these functions instead of addslashes results in not having extra slashes in the text when stored in the database.

  2. When you get the data back out of the database, you can just use htmlspecialchars on any outputted data, no need to use stripslashes, since there should be no extra slashes.


You must sanitize all requests, not only POST as GET.

You can use the function htmlentities(), the function preg_replace() with regex, or filter by cast:

<?$id = (int)$_GET['id'];?>

[]'s