YAML is great, but do not use it

YAML2019/03/12

This article was originally published on medium.com/@P0lip.

Clickabaity title — checked 😆.

I surprisingly often come across people preferring to use YAML in favor of JSON. This seems to be in particular common in the API space I’ve had a privilege to contribute to for the last couple of months. To be frank, I couldn’t quite get what their reasoning for using YAML over JSON was, therefore asked a couple of folks here and there, and these are the most common reasons I heard

human-readability,
ease of writing by hand, i.e. multiline strings,
maintainability — this is both previous points combined.

I’m afraid to say that, but this is bullshit. For real. While this might seem true for most of the time, my experience has proved otherwise. These claims are almost entirely false and I’ll prove you that shortly. Yes, YAML does have a bunch of outstanding features, yet readability, portability, and maintainability aren’t certainly its advantages. Moreover, writing it by hand is likely to be a painful journey either.

This post also acts as some sort of way to express my frustration and anger. To add a bit of context here —as said earlier, I’m working on a variety of API tooling at the current state of my career, therefore had a doubtful privilege to face some of YAML weirdnesses most people might have no clue of, and in my opinion should know for a number of reasons, as this may vastly affect their daily engineering life.

I must admit that I truly love YAML — it’s an awesome language. I don’t aim to rant it, my intention is to blame people who advocate for usage YAML over JSON for the (wrong) reasons I stated in the initial introduction.

I’ve come up with a few examples and organized them in a fairly random order. Note — I have plenty of other cases left in my pocket, but I’ll save them for the next post.

Let’s start with some basics.

Is this a valid YAML?

{}:

Yes. Unlike JSON, you can use other types than strings for the key of each mapping, so another flow mapping is a perfectly valid key. Yeah, a flow sequence is a valid key as well.

Tools written in JS will not handle such keys for most of the time. Although such a document could be represented using maps, I haven’t seen a tool that would do it. The vast majority of tools use object literals, thus losing the ability to store non-string mapping keys (JS object literals support only two primitive types as the property key, namely string, and symbol). Technically, you could do some crazy trickery using symbols and proxies, but well, that’s a whole different story for another post.

What’s this?

[
 y,
 Y,
 yes,
 Yes,
 YES,
 n,
 N,
 no,
 No,
 NO,
 on,
 On,
 ON,
 off,
 Off
]

It’s a piece of production code I’ve stumbled upon recently.

Looking cool, no? Yes, I do love it either.

What’s the outcome of that? Depends, but in most cases* it’s going to be a sequence of boolean scalars.

(*) based on top 10 google results 😆

Alright, but what happens now?

YAML

%YAML 1.2
---
[y, Y, yes, Yes, YES, n, N, no, No, NO, on, On, ON, off, Off]

As of now, it’s supposed to be a sequence of string scalars. Quite a bit of a difference, right? Why is that? Well, they changed it in YAML 1.2.

The primary objective of this revision is to bring YAML into compliance with JSON as an official subset. YAML 1.2 is compatible with 1.1 for most practical applications — this is a minor revision. […] We have removed unique implicit typing rules and have updated these rules to align them with JSON’s productions. In this version of YAML, boolean values may be serialized as “true” or “false”; the empty scalar as “null”. […]

I have no clue what the percentage of tools respecting that directive is, but I’d assume it’s rather low. If you use JS tooling such as js-yaml or @stoplight/yaml (which is a fork of the former and comes with a few extra thingies), they will support YAML 1.2.

Can the following YAML document be expressed in JSON?

YAML

2: 'bar'
'2': 'foo'

No. As stated above, YAML allows you to leverage other scalar values or maps, sequences etc. as a mapping key, so on this specific occasion, we have a map with two mappings. Depending on the convert you use, you’ll either get

YAML

{ '2': 'bar' }

YAML

{ '2': 'foo' }

or an exception if you use a more strict converter.

Multiline strings There is an entire generator out there for you to choose which style you prefer since the consistency is what we like the most! People do choose different styles because YAML is more user-friendly, so each user can choose his/her own syntax.
Are these two documents equal?

YAML

-a: true

versus

YAML

- a: true

No, they are not. The extra space does make a huge difference here. The first one is a map containing a single mapping with a “-a” string scalar as the key, and true boolean scalar as the value. The latter is a sequence with a map containing a single mapping “a” string scalar as the key, and true boolean scalar as the value. Not sure, but I think that’s what stands for the whole reasoning behind the ease of writing — you just remove or add one whitespace and tada, it’s no longer a sequence or a mapping. Easy.

In JSON it’d look as follows

YAML

{ 'a': true }

for the first example, and

YAML

[{ 'a': true }]

for the last one. Yeah, it’s clearly a worse experience.

Will the following document always be processed in the same way?

YAML

%YAML 1.2
---
a: .5
b: TRUE

No. 😆

YAML 1.2 comes with a few recommended schemas for tag resolution. If you use JSON Schema, .5and TRUE will be resolved to tag:yaml.org,2002:str tag, yet if you pick the core schema, you’ll get tag:yaml.org,2002:float and tag:yaml.org,2002:bool respectively.

Which one is the most “recommended” one? Let’s see.

Failsafe Schema
[..] It is therefore the recommended schema for generic YAML tools. […]
JSON Schema
[…] It is also strongly recommended that other schemas should be based on it.
Core Schema
[…] This is the recommended default schema that YAML processor should use unless instructed otherwise. It is also strongly recommended that other schemas should be based on it.

Yet again — YAML is more user-friendly since it leaves the call up to you. It’s not opinionated. Unlike this horrific JSON.

What’s the “value” of the key mapping here?

YAML

~:

One more time — it depends. If you use core schema for tag resolution, it’s going to be null. Otherwise, it should be treated as a plain string scalar.

What’s the value of the “foo” mapping?

YAML

foo:

Null. This makes sense, but if one had to place nullno harm would be done, right? This is a common mistake I’ve seen on a number of occasions — folks tend to leave just a mapping key, move on to some different part of the document, and forget about it that may eventually cause all sorts of nasty errors if the tool they use treats that mapping in a special way, i.e. that property is a part of some config, etc.

That’s about it.

Although certain examples I provided may seem ridiculous, I’ve seen a couple of them in the wild. That’s why I said the statement behind YAML is super cool in comparison to JSON is, well, far from reality, to say at least. As said in the introduction, I’ve got a couple of other cases to showcase, therefore a new post is likely at some point.

YAML is simply huge in comparison to JSON and there are plenty of features the vast majority of people, including myself, are unaware of. This is not a drawback per se, but one cannot claim it’s more readable if there are a couple of ways to write a multiline string… at least that’s my take on that.

That’s also a huge pain for the variety of tools, whether it’s some codegen, a linter or a parser, and so on. I’m mostly speaking of tools written in JS, I’m not sure how the situation looks like in other languages, so take my words with a grain of salt here, but I’m afraid it might not be drastically better. Every tool accepting YAML needs to support that syntax, but it’s simply too expensive to maintain, thus most of them simply try to treat YAML in a JSON-ish way, meaning it’s parsed and represented as JS object literal.

Yet again — YAML is truly fantastic, but the statement it’s a developer-friendly language is just tad bold. It’s very error-prone, and what’s worse, you cannot be really sure how your document will be processed or whether it’ll be processed at all in the most extreme cases. This is the main takeaway of my post — YAML is by far more powerful language in comparison to JSON, and since with great power, comes great responsibility, you need to be careful.

Does it mean you should avoid YAML at all costs? No. For instance, if your tool does not support any other more reasonable format than YAML or you’ve already had plenty of YAML documents, you should consider sticking with YAML.

tl:dr; YAML is more readable and user-friendly. Go use YAML.