ruby 1.9: invalid byte sequence in UTF-8 ruby 1.9: invalid byte sequence in UTF-8 ruby ruby

ruby 1.9: invalid byte sequence in UTF-8


In Ruby 1.9.3 it is possible to use String.encode to "ignore" the invalid UTF-8 sequences. Here is a snippet that will work both in 1.8 (iconv) and 1.9 (String#encode) :

require 'iconv' unless String.method_defined?(:encode)if String.method_defined?(:encode)  file_contents.encode!('UTF-8', 'UTF-8', :invalid => :replace)else  ic = Iconv.new('UTF-8', 'UTF-8//IGNORE')  file_contents = ic.iconv(file_contents)end

or if you have really troublesome input you can do a double conversion from UTF-8 to UTF-16 and back to UTF-8:

require 'iconv' unless String.method_defined?(:encode)if String.method_defined?(:encode)  file_contents.encode!('UTF-16', 'UTF-8', :invalid => :replace, :replace => '')  file_contents.encode!('UTF-8', 'UTF-16')else  ic = Iconv.new('UTF-8', 'UTF-8//IGNORE')  file_contents = ic.iconv(file_contents)end


The accepted answer nor the other answer work for me. I found this post which suggested

string.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')

This fixed the problem for me.


My current solution is to run:

my_string.unpack("C*").pack("U*")

This will at least get rid of the exceptions which was my main problem