python3: bytes vs bytearray, and converting to and from strings
bytes and bytearrays are similar...
python3's bytes
and bytearray
classes both hold arrays of bytes, where each byte can take on a value between 0 and 255. The primary difference is that a bytes
object is immutable, meaning that once created, you cannot modify its elements. By contrast, a bytearray
object allows you to modify its elements.
Both bytes
and bytearay
provide functions to encode and decode strings.
bytes
and encoding strings
A bytes object can be constructed in a few different ways:
>>> bytes(5)b'\x00\x00\x00\x00\x00'>>> bytes([97, 98, 99])b'abc'>>> b'abc'b'abc'>>> bytes('abc')TypeError: string argument without an encoding>>> bytes('abc', 'utf-8')b'abc'>>> 'abc'.encode('utf-8')b'abc'>>> 'abc'.encode('utf-16')b'\xff\xfea\x00b\x00c\x00'>>> 'abc'.encode('utf-16-le')b'a\x00b\x00c\x00'
Note the difference between the last two: 'utf-16' specifies a generic utf-16encoding, so its encoded form includes a two-byte "byte order marker" preambleof [0xff, 0xfe]
. When specifying an explicit ordering of 'utf-16-le' as inthe latter example, the encoded form omits the byte order marker.
Because a bytes object is immutable, attempting to change one of its elementsresults in an error:
>>> a = bytes('abc', 'utf-8')>>> ab'abc'>>> a[1] = 102TypeError: 'bytes' object does not support item assignment
bytearray and encoding strings
Like bytes
, a bytearray can be constructed in a number of ways:
>>> bytearray(5)bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00')>>>bytearray([1, 2, 3])bytearray(b'\x01\x02\x03')>>> bytearray('abc')TypeError: string argument without an encoding>>> bytearray('abc', 'utf-8')bytearray(b'abc')>>> bytearray('abc', 'utf-16')bytearray(b'\xff\xfea\x00b\x00c\x00')>>> bytearray('abc', 'utf-16-le')bytearray(b'a\x00b\x00c\x00')
Because a bytearray is mutable, you can modify its elements:
>>> a = bytearray('abc', 'utf-8')>>> abytearray(b'abc')>>> a[1]=114>>> abytearray(b'arc')
appending bytes and bytearrays
bytes
and bytearray
objects may be catenated with the + operator:
>>> a = bytes(3)>>> ab'\x00\x00\x00'>>> b = bytearray(4)>>> bbytearray(b'\x00\x00\x00\x00')>>> a+bb'\x00\x00\x00\x00\x00\x00\x00'>>> b+abytearray(b'\x00\x00\x00\x00\x00\x00\x00')
Note that the catenated result takes on the type of the first argument, so a+b
produces a bytes
object and b+a
produces a bytearray
.
converting bytes and bytearray objects into strings
bytes and bytearray objects can be converted to strings using the decode
function. The function assumes that you provide the same decoding type as the encoding type. For example:
>>> a = bytes('abc', 'utf-8')>>> ab'abc'>>> a.decode('utf-8')'abc'>>> b = bytearray('abc', 'utf-16-le')>>> bbytearray(b'a\x00b\x00c\x00')>>> b.decode('utf-16-le')'abc'