SET NAMES utf8 in MySQL? SET NAMES utf8 in MySQL? mysql mysql

SET NAMES utf8 in MySQL?


It is needed whenever you want to send data to the server having characters that cannot be represented in pure ASCII, like 'ñ' or 'ö'.

That if the MySQL instance is not configured to expect UTF-8 encoding by default from client connections (many are, depending on your location and platform.)

Read http://www.joelonsoftware.com/articles/Unicode.html in case you aren't aware how Unicode works.

Read Whether to use "SET NAMES" to see SET NAMES alternatives and what exactly is it about.


From the manual:

SET NAMES indicates what character set the client will use to send SQL statements to the server.

More elaborately, (and once again, gratuitously lifted from the manual):

SET NAMES indicates what character set the client will use to send SQL statements to the server. Thus, SET NAMES 'cp1251' tells the server, “future incoming messages from this client are in character set cp1251.” It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use a SELECT statement.)


Getting encoding right is really tricky - there are too many layers:

  • Browser
  • Page
  • PHP
  • MySQL

The SQL command "SET CHARSET utf8" from PHP will ensure that the client side (PHP) will get the data in utf8, no matter how they are stored in the database. Of course, they need to be stored correctly first.

DDL definition vs. real data

Encoding defined for a table/column doesn't really mean that the data are in that encoding. If you happened to have a table defined as utf8 but stored as differtent encoding, then MySQL will treat them as utf8 and you're in trouble. Which means you have to fix this first.

What to check

You need to check in what encoding the data flow at each layer.

  • Check HTTP headers, headers.
  • Check what's really sent in body of the request.
  • Don't forget that MySQL has encoding almost everywhere:
    • Database
    • Tables
    • Columns
    • Server as a whole
    • Client
      Make sure that there's the right one everywhere.

Conversion

If you receive data in e.g. windows-1250, and want to store in utf-8, then use this SQL before storing:

SET NAMES 'cp1250';

If you have data in DB as windows-1250 and want to retreive utf8, use:

SET CHARSET 'utf8';

Few more notes:

  • Don't rely on too "smart" tools to show the data. E.g. phpMyAdmin does (was doing when I was using it) encoding really bad. And it goes through all the layers so it's hard to find out.
  • Also, Internet Explorer had really stupid behavior of "guessing" the encoding based on weird rules.
  • Use simple editors where you can switch encoding. I recommend MySQL Workbench.