YAML could do better. Please try again (TOML)

Here’s a blog post, written by Ruud van Asseldonk entitled ‘The YAML document from hell’ (thanks to Randy Bush who sent it on a private channel), which does a really good job of summing up why YAML is really not what it should be.

Ruud’s concerns with YAML can be summarized as:

Number strings are parsed as numerics if they look like floating points (because not all strings are necessarily quoted)
Implicit use of Sexagesimal (base 60) number literal values (this has to be one of the most bizarre choices made in this documentation standard)
Because string quoting is optional, if you work in an ISO3166 two-letter country-code space, you will find NO(rway) promoted to the boolean ‘false’ value.

And several other problems. Now, to be fair to the YAML standard authors, and Ruud, many of these problems are now addressed by revisions in the YAML specification but the problem would remain at large in older code and systems, and the risk of misapplication or misinterpretation of a YAML configuration remains.

Ruud’s post does point out that subsequent revisions to the YAML standard have addressed some of this, but the overall impression is that it’s fragile and full of potential ‘foot cannons’, which in the context it’s used (centralized configuration of options for parsing into a device/system specific configuration), is a huge problem. They’re called ‘foot cannons’ because it’s incredibly easy to ‘shoot yourself in the foot’ using a system without understanding the pitfalls and risks and because it’s a system designed to centrally manage LOTS of things, so the effect of a mistake is very often magnified out into the network at large.

As the post says, ‘Templating YAML is a terrible, terrible idea’ and Ruud makes the obvious specific recommendation — use something else. He does suggest using a subset, but I tend to think either TOML or JSON is a better choice. TOML or JSON seem like good choices for the reason Ruud notes — they have good support in Python (and by now, it shouldn’t be necessary to remind you to use Python3 not an earlier version of Python).

For anyone who is reading or working in the IETF standards process, this is a bit of a concern because YAML is now a component of work for consideration as a MIME encoding. It would seem like for Simple Network Management Protocol (SNMP) there is a small group of experts who are literate in YAML and understand the pitfalls, but for most who wind up having to specify a context-specific, locally applied YAML template, the risks are huge.

This isn’t just a theoretical risk, because another IETF standard descriptive mechanism is RFC 6020 ‘YANG – A Data Modeling Language for the Network Configuration Protocol (NETCONF)’, which is much more aligned to the XML format. People are already coding translators to and from YANG to YAML because of the potential for automatic configuration.

YAML is now ubiquitous, because of its widespread use in deployment technologies like Kubernetes, and HELM. Why is this discussion relevant to the network? Because Kubernetes has a high dependency on RFC 1918 private addresses and intermediary proxy services in the form of the ingress controller, which enables multiple back-ends and multipoint delivery of services and scaling. As more and more people move to automated deployment in the cloud and use this configuration methodology, more and more systems are going to be built to this moving-target notation, intersecting with the configuration of their network bindings.

What do you think? Join the discussion below.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply Cancel reply