How can I convert a string from windows-1252 to utf-8 in Ruby? How can I convert a string from windows-1252 to utf-8 in Ruby? windows windows

How can I convert a string from windows-1252 to utf-8 in Ruby?


For Ruby 1.8.6, it appears you can use Ruby Iconv, part of the standard library:

Iconv documentation

According this helpful article, it appears you can at least purge unwanted win-1252 characters from your string like so:

ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')valid_string = ic.iconv(untrusted_string + ' ')[0..-2]

One might then attempt to do a full conversion like so:

ic = Iconv.new('UTF-8', 'WINDOWS-1252')valid_string = ic.iconv(untrusted_string + ' ')[0..-2]


If you're on Ruby 1.9...

string_in_windows_1252 = database.get(...)# => "FĂ„bulous"string_in_windows_1252.encoding# => "windows-1252"string_in_utf_8 = string_in_windows_1252.encode('UTF-8')# => "Fabulous"string_in_utf_8.encoding# => 'UTF-8'


Hy,

I had the exact same problem.

These tips helped me get goin:

Always check for the proper encoding name in order to feed your conversion tools correctly.In doubt you can get a list of supported encodings for iconv or recode using:

$ recode -l

or

$ iconv -l

Always start from you original file and encode a sample to work with:

$ recode windows-1252..u8 < original.txt > sample_utf8.txt

or

$ iconv -f windows-1252 -t utf8 original.txt -o sample_utf8.txt

Install Ruby1.9, because it helps you A LOT when it comes to encodings. Even if you don't use it in your programm, you can always start an irb1.9 session and pick on the strings to see what the output is.File.open has a new 'mode' parameter in Ruby 1.9. Use it!This article helped a lot: http://blog.nuclearsquid.com/writings/ruby-1-9-encodings

File.open('original.txt', 'r:windows-1252:utf-8')# This opens a file specifying all encoding options. r:windows-1252 means read it as windows-1252. :utf-8 means treat it as utf-8 internally.

Have fun and swear a lot!