Confusing about StringIO, cStringIO and ByteIO Confusing about StringIO, cStringIO and ByteIO python python

Confusing about StringIO, cStringIO and ByteIO


You should use io.StringIO for handling unicode objects and io.BytesIO for handling bytes objects in both python 2 and 3, for forwards-compatibility (this is all 3 has to offer).


Here's a better test (for python 2 and 3), that doesn't include conversion costs from numpy to str/bytes

import numpy as npimport stringb_data = np.random.choice(list(string.printable), size=1000000).tobytes()u_data = b_data.decode('ascii')u_data = u'\u2603' + u_data[1:]  # add a non-ascii character

And then:

import io%timeit io.StringIO(u_data)%timeit io.StringIO(b_data)%timeit io.BytesIO(u_data)%timeit io.BytesIO(b_data)

In python 2, you can also test:

import StringIO, cStringIO%timeit cStringIO.StringIO(u_data)%timeit cStringIO.StringIO(b_data)%timeit StringIO.StringIO(u_data)%timeit StringIO.StringIO(b_data)

Some of these will crash, complaining about non-ascii characters


Python 3.5 results:

>>> %timeit io.StringIO(u_data)100 loops, best of 3: 8.61 ms per loop>>> %timeit io.StringIO(b_data)TypeError: initial_value must be str or None, not bytes>>> %timeit io.BytesIO(u_data)TypeError: a bytes-like object is required, not 'str'>>> %timeit io.BytesIO(b_data)The slowest run took 6.79 times longer than the fastest. This could mean that an intermediate result is being cached1000000 loops, best of 3: 344 ns per loop

Python 2.7 results (run on a different machine):

>>> %timeit io.StringIO(u_data)1000 loops, best of 3: 304 µs per loop>>> %timeit io.StringIO(b_data)TypeError: initial_value must be unicode or None, not str>>> %timeit io.BytesIO(u_data)TypeError: 'unicode' does not have the buffer interface>>> %timeit io.BytesIO(b_data)10000 loops, best of 3: 77.5 µs per loop
>>> %timeit cStringIO.StringIO(u_data)UnicodeEncodeError: 'ascii' codec cant encode character u'\u2603' in position 0: ordinal not in range(128)>>> %timeit cStringIO.StringIO(b_data)1000000 loops, best of 3: 448 ns per loop>>> %timeit StringIO.StringIO(u_data)1000000 loops, best of 3: 1.15 µs per loop>>> %timeit StringIO.StringIO(b_data)1000000 loops, best of 3: 1.19 µs per loop