Validating a yaml document in python Validating a yaml document in python python python

Validating a yaml document in python


Given that JSON and YAML are pretty similar beasts, you could make use of JSON-Schema to validate a sizable subset of YAML. Here's a code snippet (you'll need PyYAML and jsonschema installed):

from jsonschema import validateimport yamlschema = """type: objectproperties:  testing:    type: array    items:      enum:        - this        - is        - a        - test"""good_instance = """testing: ['this', 'is', 'a', 'test']"""validate(yaml.load(good_instance), yaml.load(schema)) # passes# Now let's try a bad instance...bad_instance = """testing: ['this', 'is', 'a', 'bad', 'test']"""validate(yaml.load(bad_instance), yaml.load(schema))# Fails with:# ValidationError: 'bad' is not one of ['this', 'is', 'a', 'test']## Failed validating 'enum' in schema['properties']['testing']['items']:#     {'enum': ['this', 'is', 'a', 'test']}## On instance['testing'][3]:#     'bad'

One problem with this is that if your schema spans multiple files and you use "$ref" to reference the other files then those other files will need to be JSON, I think. But there are probably ways around that. In my own project, I'm playing with specifying the schema using JSON files whilst the instances are YAML.


I find Cerberus to be very reliable with great documentation and straightforward to use.

Here is a basic implementation example:

my_yaml.yaml:

name: 'my_name'date: 2017-10-01metrics:    percentage:    value: 87    trend: stable

Defining the validation schema in schema.py:

{    'name': {        'required': True,        'type': 'string'    },    'date': {        'required': True,        'type': 'date'    },    'metrics': {        'required': True,        'type': 'dict',        'schema': {            'percentage': {                'required': True,                'type': 'dict',                'schema': {                    'value': {                        'required': True,                        'type': 'number',                        'min': 0,                        'max': 100                    },                    'trend': {                        'type': 'string',                        'nullable': True,                        'regex': '^(?i)(down|equal|up)$'                    }                }            }        }    }}

Using the PyYaml to load a yaml document:

import yamldef load_doc():    with open('./my_yaml.yaml', 'r') as stream:        try:            return yaml.load(stream)        except yaml.YAMLError as exception:            raise exception## Now, validating the yaml file is straightforward:from cerberus import Validatorschema = eval(open('./schema.py', 'r').read())    v = Validator(schema)    doc = load_doc()    print(v.validate(doc, schema))    print(v.errors)

Keep in mind that Cerberus is an agnostic data validation tool, which means that it can support formats other than YAML, such as JSON, XML and so on.


Try Rx, it has a Python implementation. It works on JSON and YAML.

From the Rx site:

"When adding an API to your web service, you have to choose how to encode the data you send across the line. XML is one common choice for this, but it can grow arcane and cumbersome pretty quickly. Lots of webservice authors want to avoid thinking about XML, and instead choose formats that provide a few simple data types that correspond to common data structures in modern programming languages. In other words, JSON and YAML.

Unfortunately, while these formats make it easy to pass around complex data structures, they lack a system for validation. XML has XML Schemas and RELAX NG, but these are complicated and sometimes confusing standards. They're not very portable to the kind of data structure provided by JSON, and if you wanted to avoid XML as a data encoding, writing more XML to validate the first XML is probably even less appealing.

Rx is meant to provide a system for data validation that matches up with JSON-style data structures and is as easy to work with as JSON itself."