Convert non-ASCII chars from ASCII-8BIT to UTF-8

This works for me:

#encoding: ASCII-8BITstr = "\xC2\xA92011 AACR"p str, str.encoding#=> "\xC2\xA92011 AACR"#=> #<Encoding:ASCII-8BIT>str.force_encoding('UTF-8')p str, str.encoding#=> "©2011 AACR"#=> #<Encoding:UTF-8>

ruby utf-8 internationalization

There are two possibilities:

The input data is already UTF-8, but Ruby just doesn't know it. That seems to be your case, as "\xC2\xA9" is valid UTF-8 for the copyright symbol. In which case you just need to tell Ruby that the data is already UTF-8 using force_encoding.
For example "\xC2\xA9".force_encoding('ASCII-8BIT') would recreate the relevant bit of your input data. And "\xC2\xA9".force_encoding('ASCII-8BIT').force_encoding('UTF-8') would demonstrate that you can tell Ruby that it is really UTF-8 and get the desired result.
The input data is in some other encoding and you need Ruby to transcode it to UTF-8. In that case you'd have to tell Ruby what the current encoding is (ASCII-8BIT is ruby-speak for binary, it isn't a real encoding), then tell Ruby to transcode it.
For example, say your input data was ISO-8859-1. In that encoding the copyright symbol is just "\xA9". This would generate such a bit of data: "\xA9".force_encoding('ISO-8859-1') And this would demonstrate that you can get Ruby to transcode that to UTF-8: "\xA9".force_encoding('ISO-8859-1').encode('UTF-8')

ruby utf-8 internationalization

I used to do this for a script that scraped Greek Windows-encoded pages, using open-uri, iconv and Hpricot:

doc = open(DATA_URL)doc.rewinddata = Hpricot(Iconv.conv('utf-8', "WINDOWS-1253", doc.readlines.join("\n")))

I believe that was Ruby 1.8.7, not sure how things are with ruby 1.9

CodeHunter

Convert non-ASCII chars from ASCII-8BIT to UTF-8

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last