ruby 1.9: invalid byte sequence in UTF-8

ruby encoding utf-8

In Ruby 1.9.3 it is possible to use String.encode to "ignore" the invalid UTF-8 sequences. Here is a snippet that will work both in 1.8 (iconv) and 1.9 (String#encode) :

require 'iconv' unless String.method_defined?(:encode)if String.method_defined?(:encode)  file_contents.encode!('UTF-8', 'UTF-8', :invalid => :replace)else  ic = Iconv.new('UTF-8', 'UTF-8//IGNORE')  file_contents = ic.iconv(file_contents)end

or if you have really troublesome input you can do a double conversion from UTF-8 to UTF-16 and back to UTF-8:

require 'iconv' unless String.method_defined?(:encode)if String.method_defined?(:encode)  file_contents.encode!('UTF-16', 'UTF-8', :invalid => :replace, :replace => '')  file_contents.encode!('UTF-8', 'UTF-16')else  ic = Iconv.new('UTF-8', 'UTF-8//IGNORE')  file_contents = ic.iconv(file_contents)end

ruby encoding utf-8

The accepted answer nor the other answer work for me. I found this post which suggested

string.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')

This fixed the problem for me.

ruby encoding utf-8

My current solution is to run:

my_string.unpack("C*").pack("U*")

This will at least get rid of the exceptions which was my main problem

CodeHunter

ruby 1.9: invalid byte sequence in UTF-8

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last