Is XML too ‘verbose’?

As one of the most important and comprehensive systems languages, XML has enjoyed great popularity with CMS designers for the creation of structured writing systems.

As one of the most important and comprehensive systems languages, XML has enjoyed great popularity with CMS designers for the creation of structured writing systems.

As one of the most important and comprehensive languages for encoding text, XML has enjoyed great popularity as the basis for the creation of structured writing techniques and technologies. Yet even with continuous refinement and the broad adoption of data models like DITA for authoring and publishing, XML poses major challenges, especially when compared to so-called plain-text-formatting languages such as Markdown.  Many, like The Content Wrangler’s Mark Baker, are criticizing XML for its perceived limitations.

“XML’s complexity makes it hard to author native content.”

A ‘verbose’ language
In a post bluntly titled “Why XML Sucks,” Baker says that, while performing a vital function as the basis for structured writing systems, XML’s tagging – which he says makes XML “verbose” – inhibits author productivity.

“If you write in raw XML you are constantly having to type opening and closing tags, and even if your [XML] editor [application] helps you, you still have to think about tags all the time, even when just typing ordinary text structures like paragraphs and lists,” said Baker.

“And when you read, all of those tags get in the way of easily scanning or reading the text. Of course, the people you are writing for are not expected to read the raw XML, but as a writer, you have to read what you wrote.”

The absence of absence
Baker hangs a lantern on the issue of whitespace. He cites the original purpose of XML (“XML was designed as a data transport layer for the Web. It was supposed to replace HTML and perform the function now performed by JSON. It was for machines to talk to machines…”) as the reason why whitespace has no meaning in XML.

And what’s the big deal about whitespace? Says Baker, “…in actual writing, whitespace is the basic building block of structure. Hitting return to create a new paragraph is an ingrained behavior in all writers….”

He goes on.  “This failure [of XML] to use whitespace to mean what whitespace means in ordinary documents is a major contributor to the verbosity of XML markup.  It is why we need so many elements for ordinary text structures and why we need end tags for everything.”

“XML performs a vital function…”

No ambiguity
While all this talk of verbosity and whitespace may seem fairly damning to the future of XML, the truth is that it serves a fundamental purpose, a “vital function,” as Baker puts it, that lots of people use and which contributes to its longevity. As one Charles Gordon of NetSilicon said in 2001, XML is “…a tool that concisely and unambiguously defines the format of data records.”

The “unambiguous” aspect is particularly important. While Baker may lament the loss of readability when viewing XML-encoded content in its raw form, the fact that XML requires authors to make conscious decisions about the structure of what they’re writing – even the placement of whitespace – makes every line purposeful. XML is ideal for communicating with unambiguous intent, which is the precise purpose of structured writing systems and rule-based content architecture. Raw XML is indeed verbose, but its general simplicity has made it a building block of so many improvements in technical communication that its use endures and even flourishes to this day.

2 thoughts on “Is XML too ‘verbose’?

  1. Eric,

    The unambiguous aspect is indeed vital. XML is verbose enough to avoid ambiguity in all cases (if used appropriately). That is a useful property, but it also means that it brings that full verbosity to every applications, whether that application needs it or not.

    In many cases you can be unambiguous for a particular purpose while being much less verbose. (Using whitespace to denote paragraphs, rather than paragraph tags, for instance.) This is why we have seen a growth of language like Markdown. It can’t be used for nearly as many things as XML, but for the one it can be used for, it is just as unambiguous.

    Same thing goes for formats like JSON YAML: They can’t be used for as many things as XML, but they are less verbose and just as unambiguous for the things they can be used for.

    That tension is always going to be there: Do you choose the less capable, less verbose format that is adequate for the current project, or the more capable, more verbose format, whose verbosity I don’t actually need, but which may (for instance) have a more extensive tool chain available?

    Some people will certainly choose the later. The verbosity and its consequences will still suck, even if overall it is the right choice. Going to the dentist still sucks, even if it is the right choice. For the people who have to put up with that suckiness, there may be some consolation in knowing why it has to suck so much, that is it not merely the consequence of bad design, but of the properties required to serve its larger purpose.

    1. Mark,

      Thanks for adding to the discussion. I think you’ve identified the unresolved tension, but I would phrase it this way: do you choose the less verbose format at the expense of linking mechanisms? So far, people are saying, “Yes, yes we will make that trade.” Markdown cannot do content linking in any way that approaches what can be done in DITA, and people are saying that’s OK…for now. And, they’re making this trade because the cost of adopting XML and its big-muscle linking capabilities are more than covered by the increased productivity that comes by pushing documentation capabilities out to the sources of information: engineers, end-users, customer-support teams, etc. Markdown makes it possible for all of these groups to contribute productively. Companies will continue to make that trade-off until the cost of content linkage is no longer covered.


Leave a Reply

Your email address will not be published. Required fields are marked *