How to force PyYAML to load strings as unicode objects?
Here's a version which overrides the PyYAML handling of strings by always outputting unicode
. In reality, this is probably the identical result of the other response I posted except shorter (i.e. you still need to make sure that strings in custom classes are converted to unicode
or passed unicode
strings yourself if you use custom handlers):
# -*- coding: utf-8 -*-import yamlfrom yaml import Loader, SafeLoaderdef construct_yaml_str(self, node): # Override the default string handling function # to always return unicode objects return self.construct_scalar(node)Loader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)SafeLoader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)print yaml.load(u"""---- spam- eggs- bacon- crème brûlée- spam""")
(The above gives [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam']
)
I haven't tested it on LibYAML
(the c-based parser) as I couldn't compile it though, so I'll leave the other answer as it was.
Here's a function you could use to use to replace str
with unicode
types from the decoded output of PyYAML
:
def make_str_unicode(obj): t = type(obj) if t in (list, tuple): if t == tuple: # Convert to a list if a tuple to # allow assigning to when copying is_tuple = True obj = list(obj) else: # Otherwise just do a quick slice copy obj = obj[:] is_tuple = False # Copy each item recursively for x in xrange(len(obj)): obj[x] = make_str_unicode(obj[x]) if is_tuple: # Convert back into a tuple again obj = tuple(obj) elif t == dict: for k in obj: if type(k) == str: # Make dict keys unicode k = unicode(k) obj[k] = make_str_unicode(obj[k]) elif t == str: # Convert strings to unicode objects obj = unicode(obj) return objprint make_str_unicode({'blah': ['the', 'quick', u'brown', 124]})