PostgreSQL improperly sorts unicode chars with Czech collation PostgreSQL improperly sorts unicode chars with Czech collation postgresql postgresql

PostgreSQL improperly sorts unicode chars with Czech collation


It is correct. Accent for á, ď, é, ě, í, ň, ó, ť, ú, ů, ý should be ignored see article

Czech sort rules are little bit complex :)


PostgreSQL does not have its own sort rules, it uses the rules provided by the operating system. If you try with /usr/bin/sort with the same locale, you'll get the same sort order.

Here's the result with your sample data when tried with Ubuntu 12.04, PostgreSQL 9.1:

create COLLATION cs_CZ (locale="cs_CZ.UTF-8");select * from (values('Ca'),('Čb'),('Cc')) as l(a) order by a collate cs_CZ;

Result:

 a  ---- Ca Cc Čb(3 rows)

Notice that it's sorted as you say it should.

If your operating system sorts differently and you're sure that it's wrong according to official czech rules, then it's a bug in its czech locale implementation.

UPDATE following comment:

 SELECT * FROM (values('A'),('Da'),('Ďb'),('Dc'),('E')) AS l(a)   ORDER BY a COLLATE cs_CZ;

results in:

 a  ---- A Da Ďb Dc E


sorting in czech collation is correct by czech grammar rules!

Characters like á, ď, é, ě, í, ň, ó, ť, ú, ů, ý are sorted like they don't have punctuation so result:

A, Da, Ďb, Dc, E is corret by czech grammar.

For Slovak and Czech it can sounds crazy, but "rules as rules".

Other rules are for slovak language (collate sk_SK) where characters d-ď, t-ť, n-ň, l-ľ are in alphabetical order like czech Ď in this case.