The dreaded MySQL import encoding issue - revisited
This is how I ended up solving my problem:
First mysqldump -uusername -ppassword --default-character-set=latin1 database -r dump.sql
Then run this script:
$search = array('/latin1/');$replace = array('utf8');foreach (range(128, 255) as $dec) { $search[] = "/\x".dechex($dec)."/"; $replace[] = "&#$dec;";}$input = fopen('dump.sql', 'r');$output = fopen('result.sql', 'w');while (!feof($input)) { $line = fgets($input); $line = preg_replace($search, $replace, $line); fwrite($output, $line);}fclose($input);fclose($output);
The script finds all the hex characters above 127 and encoded them into their HTML entities.
Then mysql -uusername -ppassword database < result.sql
A common problem with older WordPress databases and even newer ones is that the database tables get set as latin-1 but the contents are actually encoded as UTF-8. If you try to export as UTF-8 MySQL will attempt to convert the (supposedly) Latin-1 data to UTF-8 resulting in double encoded characters since the data was already UTF-8.
The solution is to export the tables as latin-1. Since MySQL thinks they are already latin-1 it will do a straight export.
Change the character set from ‘latin1′ to ‘utf8′.Since the dumped data was not converted during the export process, it’s actually UTF-8 encoded data.
Create your new table as UTF-8 If your CREATE TABLE command is in your SQL dump file, change the character set from ‘latin1′ to ‘utf8′.
Import your data normally. Since you’ve got UTF-8 encoded data in your dump file, the declared character set in the dump file is now UTF-8, and the table you’re importing into is UTF-8, everything will go smoothly
I was able to resolve this issue by modifying my wp-config.php as follows:
/** Database Charset to use in creating database tables. */define('DB_CHARSET', 'utf8');/** The Database Collate type. Don't change this if in doubt. */define( 'DB_COLLATE', 'utf8_general_ci' );