How to force PyYAML to load strings as unicode objects? How to force PyYAML to load strings as unicode objects? python python

How to force PyYAML to load strings as unicode objects?


Here's a version which overrides the PyYAML handling of strings by always outputting unicode. In reality, this is probably the identical result of the other response I posted except shorter (i.e. you still need to make sure that strings in custom classes are converted to unicode or passed unicode strings yourself if you use custom handlers):

# -*- coding: utf-8 -*-import yamlfrom yaml import Loader, SafeLoaderdef construct_yaml_str(self, node):    # Override the default string handling function     # to always return unicode objects    return self.construct_scalar(node)Loader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)SafeLoader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)print yaml.load(u"""---- spam- eggs- bacon- crème brûlée- spam""")

(The above gives [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam'])

I haven't tested it on LibYAML (the c-based parser) as I couldn't compile it though, so I'll leave the other answer as it was.


Here's a function you could use to use to replace str with unicode types from the decoded output of PyYAML:

def make_str_unicode(obj):    t = type(obj)    if t in (list, tuple):        if t == tuple:            # Convert to a list if a tuple to             # allow assigning to when copying            is_tuple = True            obj = list(obj)        else:             # Otherwise just do a quick slice copy            obj = obj[:]            is_tuple = False        # Copy each item recursively        for x in xrange(len(obj)):            obj[x] = make_str_unicode(obj[x])        if is_tuple:             # Convert back into a tuple again            obj = tuple(obj)    elif t == dict:         for k in obj:            if type(k) == str:                # Make dict keys unicode                k = unicode(k)            obj[k] = make_str_unicode(obj[k])    elif t == str:        # Convert strings to unicode objects        obj = unicode(obj)    return objprint make_str_unicode({'blah': ['the', 'quick', u'brown', 124]})