MySQL outputs Western encoding in UTF-8 PHP file MySQL outputs Western encoding in UTF-8 PHP file nginx nginx

MySQL outputs Western encoding in UTF-8 PHP file


Use mysqli_set_charset to change the client encoding to UTF-8 just after you connect:

$mysqli->set_charset("utf8");

The client encoding is what MySql expects your input to be in (e.g. when you insert user-supplied text to a search query) and what it gives you the results in (so it has to match your output encoding in order for echo to display things correctly).

You need to have it match the encoding of your web page to account for the two scenarios above and the encoding of the PHP source file (so that the hardcoded parts of your queries are interpreted correctly).

Update: How to convert data inserted using latin-1 to utf-8

Regarding data that have already been inserted using the wrong connection encoding there is a convenient solution to fix the problem. For each column that contains this kind of data you need to do:

ALTER TABLE table_name MODIFY column_name existing_column_type CHARACTER SET latin1;ALTER TABLE table_name MODIFY column_name BLOB;ALTER TABLE table_name MODIFY column_name existing_column_type CHARACTER SET utf8;

The placeholders table_name, column_name and existing_column_type should be replaced with the correct values from your database each time.

What this does is

  1. Tell MySql that it needs to store data in that column in latin1. This character set contains only a small subset of utf8 so in general this conversion involves data loss, but in this specific scenario the data was already interpreted as latin1 on input so there will be no side effects. However, MySql will internally convert the byte representation of your data to match what was originally sent from PHP.
  2. Convert the column to a binary type (BLOB) that has no associated encoding information. At this point the column will contain raw bytes that are a proper utf8 character string.
  3. Convert the column to its previous character type, telling MySql that the raw bytes should be considered to be in utf8 encoding.

WARNING: You can only use this indiscriminate approach if the column in question contains only incorrectly inserted data. Any data that has been correctly inserted will be truncated at the first occurrence of any non-ASCII character!

Therefore it's a good idea to do it right now, before the PHP side fix goes into effect.


Use mysqli::set_charset function.

$mysqli->set_charset('utf8'); //returns false if the encoding was not valid... won't happen

http://php.net/manual/en/mysqli.set-charset.php

I haven't used mysqli for some time, but if things are the same, connections by default use the latin swedish encoding (ISO 8859 1).

I will consider your page is already using utf8 encoding by having:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

Inside the <head> tag.

If you have string already on latin swedish encoding, you can use mk_convert_encoding:

http://php.net/manual/en/function.mb-convert-encoding.php

$fixedStr = mb_convert_encoding($wrongStr, 'UTF-8', 'ISO-8859-1');

iconv does something very similar: Truth be told, I don't know the difference, but here's the link to the function reference:http://php.net/manual/en/function.iconv.php

I just realized that you might have some strings in utf8 and others in latin swedish. You can use mb_detect_encoding for that: http://php.net/manual/en/function.mb-detect-encoding.php

You can also dump the database and use iconv (cmd line) if you have it installed:

iconv -f latain -t utf-8 < currentdb.sql > fixeddb.sql