when we import csv data, how eliminate "invalid byte sequence in UTF-8" when we import csv data, how eliminate "invalid byte sequence in UTF-8" ruby ruby

when we import csv data, how eliminate "invalid byte sequence in UTF-8"


Ruby 1.9 CSV has new parser that works with m17n. The parser works with Encoding of IO object in the string. Following methods: ::foreach, ::open, ::read, and ::readlines could take in optional options :encoding which you could specify the the Encoding.

For example:

CSV.read('/path/to/file', :encoding => 'windows-1251:utf-8')

Would convert all strings to UTF-8.

Also you can use the more standard encoding name 'ISO-8859-1'

CSV.read('/..', {:headers => true, :col_sep => ';', :encoding => 'ISO-8859-1'})


CSV.parse(File.read('/path/to/csv').scrub)


I answered a similar question that deals with reading external files in 1.9.2 with non-UTF-8 encodings. I think that answer will help you a lot: Character Encoding issue in Rails v3/Ruby 1.9.2

Note that you need to know the source encoding for you to convert it anything reliably. There are libraries like the one I linked to in my other answer that can help you determine this.

Also, if you aren't loading the data from a file, you can convert the encoding of a string in 1.9.2 quite easily:

'string'.encode('UTF-8')

However, it's rare that you're building a string in another encoding, and it's best to convert it at the time it's read into your environment if possible.