Ruby 1.9: Convert byte array to string with multibyte UTF-8 characters Ruby 1.9: Convert byte array to string with multibyte UTF-8 characters ruby ruby

Ruby 1.9: Convert byte array to string with multibyte UTF-8 characters


This has to do with how pack interprets its input data. The U* in your example causes it to convert the input data (assumed to be in a default character set, I assume; I really couldn't find any documentation of this) to UTF-8, thus the double encoding. Instead, just pack the bytes and interpret as UTF-8:

irb(main):010:0> [67, 97, 102, 195, 169].pack('C*').force_encoding('utf-8')=> "Café"


You specifically ask about a byte array, but maybe codepoints are more suitable:

ar = 'Café'.codepoints.to_a# => [67, 97, 102, 233]ar.pack('U*')# => Café