how to deal with accents and strange characters in a database? how to deal with accents and strange characters in a database? database database

how to deal with accents and strange characters in a database?


Collation affects text sorting only, it has no effect on actual character set of stored data.

I would recommend this configuration:

  1. Set the character set for the whole DB only, so you don't have to set it for each table separately. Character set is inherited from DB to tables to columns. Use utf8 as the character set.

  2. Set the character set for the DB connection. Execute these queries after you connect to the database:

    SET CHARACTER SET 'utf8'SET NAMES 'utf8'
  3. Set the character set for the page, using HTTP header and/or HTML meta tag. One of these is enough. Use utf-8 as the charset.

This should be enough.

If you want to have proper sorting of Spanish strings, set collation for the whole database. utf8_spanish_ci should work (ci means Case Insensitive). Without proper collation, accented Spanish characters would be sorted always last.

Note: it's possible that the character set of data you already have in a table is broken, because you character set configuration was wrong previously. You should check it using some DB client first to exclude this case. If it's broken, just re-insert your data with the right character set configuration.

How does character set work in a database

  • objects have a character set attribute, which can be set explicitly or it's inherited (server > database > table > column), so the best option is to set it for the whole database

  • client connection has also a character set attribute and it's telling the database in which encoding you're sending the data

If client connection's and target object's character sets are different, the data you're sending to the database are automatically converted from the connection's character set to the object's character set.

So if you have for example the data in utf8, but client connection set to latin1, the database will break the data, because it'll try to convert utf8 like it's latin1.


Here is my checklist for storing UTF8 characters. Though, be sure to isolate the cause of failure to be on the part where you store the strings into the database -- meaning the string to store is still as it was when the user inputed it.

First. Make sure the character set of the table being used is utf8 or better yet use utf8mb4 for full unicode support (though it has its drawbacks too). It doesn't matter which charset has been set for the entire database; it is overridden by the table definition, if specified. The DDL code for creating such a table would be like:

CREATE TABLE table_name (    id INT AUTO_INCREMENT NOT NULL,    name VARCHAR(190) NOT NULL,    date_created DATETIME NOT NULL,    PRIMARY KEY(id))DEFAULT CHARACTER SET utf8mb4COLLATE utf8mb4_unicode_ciENGINE = InnoDB;

Second. Use utf8 charset for the database connection.

// This should be enoughnew PDO(    'mysql:host=localhost;dbname=xxxxx;charset=utf8mb4;',    'username',    'password');


For MySql Use these code after invoking the database connection:

$set_utf=$dbh->exec("SET NAMES UTF8");