Thursday, November 16, 2006

XML is not human readable

Over the last couple of weeks I've heard the same phrase said by about five different people
The advantage of XML is that its human readable, this is why Web Services are better than previous technologies.
Now I'm not going to get in a readability of WSDL v IDL (hint: the winner isn't WSDL). But I think its worth examining the whole concept of XML and whether it should be human readable, particularly when it comes to business processes, service descriptions and service contracts.

So should a "good" Web Service description be human readable? Lets examine the purpose of that description

  1. To enable consumers to call the service correctly
  2. errr... that is pretty much it
So given that goal what is the best way to show this to both systems and to people? The answer is of course to have a common technical language that enables accurate exchange, and then have this rendered for different types of people and systems, so Java code turns into "Java", C# turns it into C# and for people it gets rendered into a nice picture that shows the methods and the constraints.

WSDL and BPEL (especially BPEL) are examples of that technical language. There was never a goal for them to be human readable, they are aiming to be machine readable. The Geo Ripping wsdl is a very simple self contained example as to why XML isn't designed for humans to read.

Sure when it comes to debugging you can print out the XML and a skilled person can spot some of the errors, but then you could do this with RMI, CORBA, DCOM and even C (using a hex editor in the later case) but the idea of "human readable" is that anyone could read the SOAP messages or BPEL process context, and this 100% isn't true.

Or to look at it another way....

XML is not human readable, its not designed to be human readable and you shouldn't try and make it human readable. Just because something is in Unicode doesn't mean that anyone can read it. French, Chinese, Klingon (WTF?), Japanese, German, English, Urdu and many other languages can be written in Unicode, and XML should be viewed in the same way but as a language with lots of unrequired syntax, no real semantics and pretty random grammar in general.

Think of XML as being English spoken by a sulky French teenager, lots and lots of grunts that mean nothing to anyone and the occasional fragment of something that no-one actually properly understands.

Reading BPEL is like trying to understand a conversation between a sulky French teenager and a sulky American teenager... in Chinese when they've only had two lessons and you don't speak Chinese.

XML is as hum

Technorati Tags: , ,


Anonymous said...

So why make these things in XML. Why not just use binary, rather than kill network bandwidth with XML. Is it that we want systems to be slower and kill the network?

Consumers can call services correctly without XML. Examples, RMI, CORBA, SQL, etc.

Anonymous said...

A couple more thoughts:

(1) Even if something is "human readable" that is, you understand all the tags etc, once you have a chunk of XML of some size - it's no longer readable. Too many lines, too much nesting, too many tags - quite simply you get overloaded. Human readable then is about more than the fact that you can actually _read_ it.

(2) XML encourages us to create properly and uniformly delimited protocols, something the likes of TCP/IP has been doing for some time. This makes writing parsers a little more easy but isn't really a property of XML as such delimiting can be done for either binary or text or XML.

(3) XML can be used to construct lowest common denominator type protocols where we use simple strings, ints and probably avoid longs (64-bit quantities don't work on all systems). This can be useful as there are plenty of tools to help some with this but it can be done by other means and in a lot of cases, those means are lighter weight.

XML isn't really a cure-all for anything and it has no great advantages over everything else except maybe accessibility of tools across all platforms. It _can_ be used to create cross-platform formats but the real art in creating such formats is making sure that the information you provide is simplistic enough to be consumable everywhere which is an issue completely orthogonal to use of XML.


Unknown said...

I think Dan has give a good reason as to why XML is working, and I'll add two more as to why we are all using XML these days rather than CORBA/IIOP

1) Microsoft, MS were there at the start of the creation of XML and WS, this meant that they didn't feel a need to invent something of their own

2) Marketing - The "Human Readable" myth is very powerful and makes non-technical architects and the like think of XML as "different" to other approaches, XML has had a massive marketing campaign from the likes of IBM, Oracle, Sun, MS, SAP, BEA, etc, etc, etc which means its brand is very strong.

The "best" technology is the one that most people support, this makes XML (IMO) the "best" technology for information exchange, but lets not kid ourselves that users will understand a BPEL context that is using WS-Security, WS-TX and WS-RM!

Anonymous said...

XML's main strength isn't readability, although it is a step up from many EDI formats! XML's main advantage is that it is a standard. It means readily available design- and run-time support, so I don't have to roll my own message formats (just schemata). I don't have to code/generate/test/maintain parsers, and discover what forms of AST work best for general application usage. It means not having to invent bizarre query mechanisms to test message content -- I have standardized, somewhat less bizarre query mechanisms I can use.

In short, XML saves me time designing and implementing messages and message handlers. Whats more, it saves the chap I going to send the message to time as well. Does that mean any of the messages or message handlers are technically ideal? Most likely not, but they are good enough for most applications. And that sounds like the mark of a good standard.

As for readability, XML straddles that middle ground between "computer friendly" data formats, and "human friendly". As a consequence, it makes neither camp very happy: computers consume more resources handling XML that is ideal, and humans can find it quite difficult to parse XML well enough to extract meaning from it. Not human readable? No, just difficult. Humans have been even known to successfully routinely edit XML documents, although never have I heard one assert that they enjoyed the process...

Unknown said...

Ron, I remember using Hex editors to change code because a compile took too long and using the same Hex editors to look at binary dumps. XML can be read, and edited, by humans but as you say that certainly isn't a design principle of XML, hence the reason I have a go at the "XML is human readable" meme.

I 100% agree with you around the importance of having a standard here, but lets fight back against the meme and next time someone says "XML is human readable" slap a multi-step, multi-partner link BPEL and ask them to tell you what it does :)

Anonymous said...

I love to use PHP's SIMPLEXML. I have several lists which change intermittently but not often enough warrant the creation of a standalone editor. I use a lot of single tags with multiple attributes and a simple regex expression reinserts the carriage linefeed back into the equation making it extremely easy to go in and edit by hand.

Anonymous said...

Its a very nice blog for...
architects in bangalore , architects in bangalore , interior designers in Bangalore , interior designers in Bangalore , architects in bangalore , architects in bangalore , interior designers in bangalore

Anonymous said...

Your post title asks whether XML should be human readable, but the question you actually ask is whether wsdl specifically should be.

You say 'XML...its not designed to be human readable', but that's not true. The 6th design goal of XML stated on W3C's recommendation paper is 'XML documents should be human-legible and reasonably clear.'

If you consider a few wider use cases for XML beyond enabling consumers to call a service it isn't at all hard to find some where human readability becomes very important indeed. To pick just one, sometimes you receive a message that seems to be causing a failure. Resolving this problem can be orders of magnitude easier if the message is human-readable and its intent can clearly and easily be understood.

It's not even true that wsdl does not need to be human readable. As an architect I've worked in environments where tools such as SOAP UI were not available and I've had to read wsdl to be able to see what services an API is exposing. Is it easy? No - you have to have an understanding of what's in a wsdl. Is it possible? Yes, absolutely.

XML purposefully trades bandwidth for readability as an aid to debugging and understandability without having to resort to tools. If meta-languages built on top of XML like WSDL and BPEL have abandoned that readability then perhaps we should question why they were developed in XML at all.