Why is table CHARSET set to utf8mb4 and COLLATION to utf8mb4_unicode_520_ci Why is table CHARSET set to utf8mb4 and COLLATION to utf8mb4_unicode_520_ci wordpress wordpress

Why is table CHARSET set to utf8mb4 and COLLATION to utf8mb4_unicode_520_ci


In the past, there was only utf8; in the future, utf8mb4 will be the default character set. now utf8mb4 is the default character set.

In the past, _general_ci was the default collation; then _unicode_ci (Unicode 4.0) was better, then _unicode_520_ci (Unicode 5.20). In the future (MySQL 8.0), the default will be _0900_ci_ai (Unicode 9.0).

Meanwhile, the road is full of potholes generated by MySQL's past mistakes. And WP designers are driving in a big tank that does not notice the potholes.

MySQL 5.6 was a big pothole that swallowed up many a WP user because of a 767 limit on indexes together with WP indexes on the overly-long VARCHAR(255) and the possibility of using utf8mb4. You are well past it by having 5.7.17. (Your future move to 8.0 will be less bumpy.)

That is, newly created databases/tables/columns on 5.7.7+ should not experience the 767 problem, but things migrated from older versions (5.5.3+) may have issues, especially if something causes you to change to utf8mb4.

What to do? I'll probably run out of space trying to spell out all the options. So provide the history of the data, the upgrade path (if any), the current settings, the ROW_FORMAT of the tables, the CHARACTER SET and COLLATION of the columns, the output of SHOW VARIABLES LIKE 'char%';

Where should you be? For 5.7.7+, utf8mb4 and utf8mb4_unicode_520_ci wherever practical. That charset gives you Emoji and all of Chinese (utf8 does not). That collation is the best available, although you might be hard pressed to notice where it matters.

Note: the first part of the collation name is the only character set that it works with. That is utf8_unicode_ci does not work with utf8mb4.