Best practise for storing multilingual strings Best practise for storing multilingual strings postgresql postgresql

Best practise for storing multilingual strings


First make sure that the database locale can deal with different languages. Use an UTF-8 server-encoding. Optionally set LC_COLLATE = 'C' to be on neutral ground or use a collation for your first language to have a default sort order. Start by reading the chapter Collation Support in the manual.

I would strongly suggest that you use the latest version of PostgreSQL (9.1 at time of writing) because it has superior collation support.

As for the table structure: keep it simple. It sounds like there is a low, fixed number of languages to deal with. You could just have a column for each language then:

CREATE TABLE txt (  txt_id serial PRIMARY KEY ,txt    text NOT NULL -- master language NOT NULL? ,txt_fr text -- others can be NULL? ,txt_es text ,txt_de text);

This is pretty effective, even with many languages. NULL storage is very cheap.
If you have a varying number of languages to deal with, a separate table might be the better solution. This solution assumes that you have a "master language", where the string is always present:

CREATE TABLE txt (  txt_id serial PRIMARY KEY ,txt    text NOT NULL -- master language NOT NULL?);CREATE TABLE lang (  lang_abbr text PRIMARY KEY -- de, es, fr, ... ,lang      text NOT NULL ,note      text);

Or, if a (two-letter) abbreviation is enough, just create a enum type to identify the language.

CREATE TABLE txt_trans (  txt_id    int REFERENCES txt(txt_id) ON UPDATE CASCADE ON DELETE CASCADE ,lang_abbr text REFERENCES lang(lang_abbr) ON UPDATE CASCADE ,txt       text NOT NULL -- master language NOT NULL? ,CONSTRAINT txt_trans_pkey PRIMARY KEY (txt_id, lang_abbr));

Not treating the master language special and keeping all language variants in the same table might make handling in your app simpler. But it really depends on your requirements.