What is the difference between #encode and #force_encoding in ruby?
Difference is pretty big. force_encoding
sets given string encoding, but does not change the string itself, i.e. does not change it representation in memory:
'łał'.bytes #=> [197, 130, 97, 197, 130]'łał'.force_encoding('ASCII').bytes #=> [197, 130, 97, 197, 130]'łał'.force_encoding('ASCII') #=> "\xC5\x82a\xC5\x82"
encode
assumes that the current encoding is correct and tries to change the string so it reads same way in second encoding:
'łał'.encode('UTF-16') #=> 'łał''łał'.encode('UTF-16').bytes #=> [254, 255, 1, 65, 0, 97, 1, 66]
In short, force_encoding
changes the way string is being read from bytes, and encode
changes the way string is written without changing the output (if possible)
Read this Changing an encoding
The associated Encoding of a String can be changed in two different ways.
First, it is possible to set the
Encoding
of a string to a new Encoding without changing the internal byte representation of the string, withString#force_encoding
. This is how you can tell Ruby the correct encoding of a string.
Example :
string = "R\xC3\xA9sum\xC3\xA9"string.encoding #=> #<Encoding:ISO-8859-1>string.force_encoding(Encoding::UTF_8) #=> "R\u00E9sum\u00E9"
Second, it is possible to transcode a string, i.e. translate its internal byte representation to another encoding. Its associated encoding is also set to the other encoding. See
String#encode
for the various forms of transcoding, and the Encoding::Converter class for additional control over the transcoding process.
Example :
string = "R\u00E9sum\u00E9"string.encoding #=> #<Encoding:UTF-8>string = string.encode!(Encoding::ISO_8859_1)#=> "R\xE9sum\xE9"string.encoding#=> #<Encoding::ISO-8859-1>