Skip to content


XML is not bad, just misused

There is the usual grumbling on Hacker News about XML

The problem isn’t with XML the problem is with the way it is used (or rather misused).

What are XMLs strengths:

  1. Valid files are definable. i.e. this element has a child elements or this element must contain only digits.
  2. Textual: errors are fixable with text editing tools.
  3. Textual: means compressible with gzip.
  4. all that “cruft”, “verboseness” makes XML files self-documenting. An XML element that looks like this:

    <FirstName>Pat</FirstName>

    is clearly containing a first name. A developer reading the XML can figure out the meaning of the data in order to extract the information.

    Binary formats are definitely NOT! self-documenting. Once the binary data layout description document is lost, the data is lost even if the file is still present. For binary formats, the “definition document” is often the code that created the file. Lose the program or the hardware that can run the program and the data is lost.

    JSON is NOT self-documenting either. Did XML need to be heavy with the open/close tag syntax – probably not but gzip exists for a reason.

  5. when a standards body needs to define industry-standard interchange formats: For example, filing information with the Security Exchange Commission.
  6. data persistence across years in a way that will survive the programs that created it. (SEC filings, Government filings, Drug testing data, etc.) Or even a standard way to describe the Bible.

Losing data for many applications is not that big a deal. Does it really matter if Google can’t read old search history logs from 5 years ago. Probably not. However, it does matter if the SEC can’t pull up security files from 5 years ago, because that filing might be relevant to a criminal probe of a bank (hah-hah).

Ask NASA how much they wish they could have recovered lost satellite data from the 1970s. Especially when trying to figure out the rate of climate change. Bitrot and technology obsolence is a problem that has been talked about for many years.

XML is really appropriate for the data persistent situation – where there is high value to the data being accessible years later.

XML is NOT a good choice for:

  1. API calls. No company should use XML-RPC. Tighter formats exists. XML is hard to parse on small devices. Just use HTTP semantics.
  2. Files whose loss is inconsequential.
  3. Data Serialization
  4. Transferring data between systems under the control of a single entity.
  5. Any situation that requires speedy parsing or generation of data.

There is a reason why XHTML didn’t last. XML and its constraints are not suitable for the transient world of web pages.

Like any other tool, XML’s purpose and limitations should be known and respected.

Don’t complain about XML being a poor solution for a problem it was not intended to solve.

Posted in rants, technical.


One Response

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Continuing the Discussion



Some HTML is OK

or, reply to this post via trackback.