Ruby 1.9: how can I properly upcase & downcase multibyte strings?
for anybody coming from Google by ruby upcase utf8
:
> "your problem chars here çöğıü Iñtërnâtiônàlizætiøn".mb_chars.upcase.to_s=> "YOUR PROBLEM CHARS HERE ÇÖĞIÜ IÑTËRNÂTIÔNÀLIZÆTIØN"
solution is to use mb_chars
.
Documentation:
Case conversion is locale dependent and doesn't always round-trip, which is why Ruby 1.9 doesn't cover it (see here and here)
The unicode-util gem should address your needs.
Case conversion is complicated and locale-dependent. Fortunately, Martin Dürst added full Unicode case mapping in Ruby 2.4:
puts RUBY_DESCRIPTIONsd, su = "Iñtërnâtiônàlizætiøn", "IÑTËRNÂTIÔNÀLIZÆTIØN"def ps(u, d, k); puts "%-30s: %24s / %-24s" % [k, u, d] end ps sd.upcase, su.downcase, "Ruby 2.4 (default)"ps sd.upcase(:ascii), su.downcase(:ascii), "Ruby 2.4 (ascii)"ps sd.upcase(:turkic), su.downcase(:turkic), "Ruby 2.4 (turkic)"ps sd.upcase(:lithuanian), su.downcase(:lithuanian), "Ruby 2.4 (lithuanian)"ps "-", su.downcase(:fold), "Ruby 2.4 (fold)"
Output:
ruby 2.4.0dev (2016-06-24 trunk 55499) [x86_64-linux]Ruby 2.4 (default) : IÑTËRNÂTIÔNÀLIZÆTIØN / iñtërnâtiônàlizætiønRuby 2.4 (ascii) : IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØnRuby 2.4 (turkic) : IÑTËRNÂTİÔNÀLİZÆTİØN / ıñtërnâtıônàlızætıønRuby 2.4 (lithuanian) : IÑTËRNÂTIÔNÀLIZÆTIØN / iñtërnâtiônàlizætiønRuby 2.4 (fold) : - / iñtërnâtiônàlizætiøn