Cython: when should I define a string as char*, str, or bytes? Cython: when should I define a string as char*, str, or bytes? python-3.x python-3.x

Cython: when should I define a string as char*, str, or bytes?


If there is no further processing done on a particular type, it would be best and fastest to not type them at all, which means they are treated as a general purpose PyObject *.

The str type is a special case which means bytes on Python 2 and unicode on Python 3.

The str type is special in that it is the byte string in Python 2 and the Unicode string in Python 3

So code that types a string as str and handles it as unicode will break on python 2 where str means bytes.

Strings only need to be typed if they are to be converted to C char* or C++ std::string. There, you would use str to handle py2/py3 compatibility, along with helper functions to convert to/from bytes and unicode in order to be able to convert to either char* or std::string.

Typing of strings is for interoperability with C/C++, not for speed as such. Cython will auto-convert, without copying, a bytes string to a char* for example when it sees something like cdef char* c_string = b_string[:b_len] where b_string is a bytes type.

OTOH, if strings are typed without that type being used, Cython will do a conversion from object to bytes/unicode when it does not need to which leads to overhead.

This can be seen in the C code generated as Pyx_PyObject_AsString, Pyx_PyUnicode_FromString et al.

This is also true in general - the rule of thumb is if a specific type is not needed for further processing/conversion, best not to type it at all. Everything in python is an object so typing will convert from the general purpose PyObject* to something more specific.


Some quick testing revealed that for this particular case, only the str declaration worked -- all other options produced errors. Since the string is generated elsewhere in Python3, evidently the str type declaration is needed.

Whether it is faster not to make any declaration at all remains an open question.