How do I include unicode strings in Python doctests? How do I include unicode strings in Python doctests? python python

How do I include unicode strings in Python doctests?


If you want unicode strings, you have to use unicode docstrings! Mind the u!

# -*- coding: utf-8 -*-def mylen(word):  u"""        <----- SEE 'u' HERE  >>> mylen(u"áéíóú")  5  """  return len(word)print mylen(u"áéíóú")

This will work -- as long as the tests pass. For Python 2.x you need yet another hack to make verbose doctest mode work or get correct tracebacks when tests fail:

if __name__ == "__main__":    import sys    reload(sys)    sys.setdefaultencoding("UTF-8")    import doctest    doctest.testmod()

NB! Only ever use setdefaultencoding for debug purposes. I'd accept it for doctest use, but not anywhere in your production code.


Python 2.6.6 doesn't understand unicode output very well, but this can be fixed using:

  • already described hack with sys.setdefaultencoding("UTF-8")
  • unicode docstring (already mentioned above too, thanks a lot)
  • AND print statement.

In my case this docstring tells that test is broken:

def beatiful_units(*units):    u'''Returns nice string like 'erg/(cm² sec)'.    >>> beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))    u'erg/(cm² sec)'    '''

with "error" message

Failed example:    beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))Expected:    u'erg/(cm² sec)'Got:    u'erg/(cm\xb2 sec)'

Using print we can fix that:

def beatiful_units(*units):    u'''Returns nice string like 'erg/(cm² sec)'.    >>> print beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))    erg/(cm² sec)    '''


This appears to be a known and as yet unresolved issue in Python. See open issues here and here.

Not surprisingly, it can be modified to work OK in Python 3 since all strings are Unicode there:

def mylen(word):  """  >>> mylen("áéíóú")  5  """  return len(word)print(mylen("áéíóú"))