Multi language database encoding in search engine
You have several issues to solve to make this work correctly.
First, you've chosen the utf8
character set to hold all your text. That is a good choice. If this is a new-in-2016 application, you might choose the utf8mb4
character set instead. Once you have chosen a character set your users should be able to read your text.
Second, for the sake of searching and sorting (WHERE
and ORDER BY
) you need to choose an appropriate collation for each language. For modern German, utf8_general_ci
will work tolerably well. utf8_unicode_ci
works a little better if you need standard lexical ordering. Read this. http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html
For modern Spanish, you should use utf8_spanish_ci
. That's because in Spanish the N and Ñ characters are not considered the same. I don't know whether the general collation works for Turkish.
Notice that you seem to have confused the notions of character set and collation in your question. You've mentioned a collation with your Turkish column and a character set with your German column.
You can explicitly specify character set and collation in queries. For example, you can write
WHERE _utf8 'München' COLLATE utf8_unicode_ci = table.name;
In this expression, _utf8 'München'
is a character constant, and
constant COLLATE utf8_unicode_ci = table.name
is a query specifier which includes an explicit collation name. Read this.http://dev.mysql.com/doc/refman/5.7/en/charset-collate.html
Third, you may want to assign a default collation to each language specific column. Default collations are baked into indexes, so they'll help accelerate searching.
Fourth, your users will need to use an appropriate input method (keyboard mapping, etc) to present data to your application. Turkish-language users hopefully know how to type Turkish words.