PostgreSQL improperly sorts unicode chars with Czech collation
It is correct. Accent for á, ď, é, ě, í, ň, ó, ť, ú, ů, ý should be ignored see article
Czech sort rules are little bit complex :)
PostgreSQL does not have its own sort rules, it uses the rules provided by the operating system. If you try with /usr/bin/sort
with the same locale, you'll get the same sort order.
Here's the result with your sample data when tried with Ubuntu 12.04, PostgreSQL 9.1:
create COLLATION cs_CZ (locale="cs_CZ.UTF-8");select * from (values('Ca'),('Čb'),('Cc')) as l(a) order by a collate cs_CZ;
Result:
a ---- Ca Cc Čb(3 rows)
Notice that it's sorted as you say it should.
If your operating system sorts differently and you're sure that it's wrong according to official czech rules, then it's a bug in its czech locale implementation.
UPDATE following comment:
SELECT * FROM (values('A'),('Da'),('Ďb'),('Dc'),('E')) AS l(a) ORDER BY a COLLATE cs_CZ;
results in:
a ---- A Da Ďb Dc E
sorting in czech collation is correct by czech grammar rules!
Characters like á, ď, é, ě, í, ň, ó, ť, ú, ů, ý are sorted like they don't have punctuation so result:
A, Da, Ďb, Dc, E is corret by czech grammar.
For Slovak and Czech it can sounds crazy, but "rules as rules".
Other rules are for slovak language (collate sk_SK) where characters d-ď, t-ť, n-ň, l-ľ are in alphabetical order like czech Ď in this case.