What is the difference between #encode and #force_encoding in ruby? What is the difference between #encode and #force_encoding in ruby? ruby ruby

What is the difference between #encode and #force_encoding in ruby?


Difference is pretty big. force_encoding sets given string encoding, but does not change the string itself, i.e. does not change it representation in memory:

'łał'.bytes #=> [197, 130, 97, 197, 130]'łał'.force_encoding('ASCII').bytes #=> [197, 130, 97, 197, 130]'łał'.force_encoding('ASCII')   #=> "\xC5\x82a\xC5\x82"

encode assumes that the current encoding is correct and tries to change the string so it reads same way in second encoding:

'łał'.encode('UTF-16') #=> 'łał''łał'.encode('UTF-16').bytes #=> [254, 255, 1, 65, 0, 97, 1, 66] 

In short, force_encoding changes the way string is being read from bytes, and encode changes the way string is written without changing the output (if possible)


Read this Changing an encoding

The associated Encoding of a String can be changed in two different ways.

First, it is possible to set the Encoding of a string to a new Encoding without changing the internal byte representation of the string, with String#force_encoding. This is how you can tell Ruby the correct encoding of a string.

Example :

string = "R\xC3\xA9sum\xC3\xA9"string.encoding #=> #<Encoding:ISO-8859-1>string.force_encoding(Encoding::UTF_8) #=> "R\u00E9sum\u00E9"

Second, it is possible to transcode a string, i.e. translate its internal byte representation to another encoding. Its associated encoding is also set to the other encoding. See String#encode for the various forms of transcoding, and the Encoding::Converter class for additional control over the transcoding process.

Example :

string = "R\u00E9sum\u00E9"string.encoding #=> #<Encoding:UTF-8>string = string.encode!(Encoding::ISO_8859_1)#=> "R\xE9sum\xE9"string.encoding#=> #<Encoding::ISO-8859-1>