So I wrote a
quick piece on Steve Vinoski's
IEEE article and a criticism was levelled by an anonymous poster that I didn't dissect the article piece by piece... never one to not learn from such constructive criticism I thought I'd do just that....
So the paper starts of with a reasonable discussion of the origins of RPC. Noting indeed that its history dates back to the 1970s. This makes the first paragraph pretty much unique in that its well referenced and backed up by facts.
On to the second paragraph. CORBA is lumped in as a "newer" technology that is on the wane, this is slightly odd given that in IT a nearly 20 year old technology is rarely considered "new", its certainly not a term I've heard applied to C++ or even Java recently. The point on CORBA is that its "too complicated". At this stage Steve fails to mention that IIOP, the core protocol of CORBA, was in fact adopted by J2EE so could be argued as being the most successful RPC approach of all time. It is also worth noting that
DDI wasn't simply an RPC/IDL approach, but I guess the writer knows that.
The third paragraph states that SOAP is on the wane as well. This is backed up with a huge set^H^H^H^H^H^H^H^Hno statistics what so ever. The writer seems to have a nice blind spot when it comes to enterprise applications and be unaware that the likes of SAP and Oracle are using Web Services extensively in the extension and integration of their package solutions, admittedly this is only a multi-billion dollar industry and indeed probably represents the lion share of IT spending... hang on. Doesn't this mean that in fact SOAP is being used extensively and indeed its use is growing as people upgrade to Oracle Fusion and SAP netweaver? Isn't it also true that governments are more and more using Web Services to integrate between countries and departments and that companies are using SOAP pretty much as the default in B2B interactions. Indeed people describe using SOAP for these areas as
a best practice. On the wane? More likely its just that the people using them have a
career.
The writer gives himself away by proposing that people are considering a Facebook approach as the next thing.... err not in the companies I work with, absolutely
no-one has suggested doing that. This suggests a blog rather than enterprise oriented research style.
The fourth paragraph details, again without reference beyond the fact that transaction management was included in Argus, that RPC is fundamentally flawed. Now if its fundamentally flawed this should mean that its impossible to build a distributed system. It references Jim Waldo's paper (but not Deutsch's
fallacies) that remote calls are different to local ones. Err that isn't a flaw of RPC its just a fact of life when doing remote systems... what on earth could the writer mean? Surely the writer isn't assuming that just because RPC was the dominant approach that the local/remote problem goes away just because you do remote work via another method?
So the writer asks
Why, then, do we continue to use RPC-oriented systems when they're fraught with well-known and well-understood problems?
So far these problems have been detailed as
- Remote calls have more issues than local ones
- Remote transaction processing is a bitch
There are
no other issues raised and both of these points fall into the
"well duh" school of pointing out the obvious.
Then onto the meat of the article (emphasis mine).
RPC-oriented systems aim to let developers use familiar programming language constructs to invoke remote services, passing requests and data to them and expecting more data in response.
Let me for a moment shift that to the "familiar" world of the Web and REST.
Web-oriented systems aim to let developers use familiar web constructs to invoke remote resources, passing requests and data to them and expecting more data in response.
Ummm I'm not seeing the huge difference at this stage. The next bit is basically explaining how a proxy infrastructure works and how it gives developers a consistent approach to the invocation of local or remote services. It almost sounds like its pushing this point in that its abstracting the developers away from the detail of the network. But that isn't the case the writer in fact thinks this is the worst kind of thing.
Unfortunately, this approach is all about developer convenience. It's a classic case of everything looking like a nail because all we have is a hammer. In an object-oriented language[...], we represent remote services as objects and call methods or member functions on them. [...] We have a general-purpose imperative programming language hammer, so we treat distributed computing as just another nail to bend to fit the programming models that such languages offer.
What a load of crap. Seriously this is an unmitigated pile of tripe in what it means to write distributed systems. It makes two basic errors
- That the architecture and design of a system is focused on a programming language
- See number 1
Now (to quote
Red Dwarf) I know that's really only one problem but its such a big one I thought I'd mention it twice. I've built distributed systems for years and I've know about Waldo, Deutsch and indeed plain old common sense for all of that time. The programming language (Ada, C, C++, Java, LISP, etc) didn't matter what I was first interested in was the entities that interacted. This is what SOA is all about, its not about the objects and the methods its about the interacting entities and the knowledge that this interaction must be coarse grained. Architectures are normally programming language independent and anyone who starts a distributed system by writing C++ is a muppet who should be fired. Focusing on the programming language indicates a very narrow perspective on what it takes to build a distributed system and indeed indicates a focus on the
worst place to start considering distribution.
The issue raised is that by providing an abstraction that hides the network its a really bad thing and just about convenience. No it isn't. I've built distributed systems and I've had to manage teams who delivered the architectures I created and I'll say that
- 60% of the people didn't understand the challenges and wouldn't have understood Waldo
- 30% would have read it and got it wrong
- 6% Understand the challenges and can make a decent crack at it with minor problems
- 4% actually understand what it takes
The writer's position is that everyone should know about the network, this is a false position as it requires everyone to be of the same sophistication as the, very talented, writer. The reason that the abstraction works is that smart people write the architecture and the interfaces and then manage others into the delivery, thus the lack of talent in the great unwashed is hidden from themselves as they do not see the complexity. This is important, if you allow people without the talent to start designing distributed systems then it will go tits up, if however you can architect the system so they are not aware of the distribution and the efficiency is managed via the architecture then you will deliver a successful project.
The next section is about the issues of data mapping between programming languages, something that is limited to IDL as a scope but is an issue with
any two systems interacting via a 3rd party notation. Its an issue in XML as much as it is with CORBA's IDL, people can point to less issues but not to the elimination of issues. It points out a few hacks from CORBA (e.g. by-value) but oddly (or conveniently) misses out the similar issues with SOAP and XML. This section is okay as it talks about some of the challenges, it suggests however that the problems of intermediary mapping are limited to RPC systems, while for some this is true for others it is not. How about validated enumerations in XML for instance? They've always been a bugger in languages that don't have the concept of limited enumeration sets. The point around arrays mapping to an array or a class is as true in XML and messaging or REST systems as it is in RPC. The writer omits this issue however as it undermines the point that he is trying to make.
In the next section however a wonderfully wrong statement kicks it off
The illusion of RPC - the idea that a distributed call can be treated the same as a local call - ignores not only latency and partial failure[...]
Errr apart from muppets who thinks that a distributed RPC call can be treated the same as a remote call? Every single RPC approach has a litany of exceptions that you have to catch because you are doing a remote call, that is baked into the frameworks. That however is only the coding highlight. The reality is that
all decent architects have always known that the two are different and have worked accordingly. This really is the worst kind of argument. Setting up a completely specious argument as the basis for this area really does undermine the whole credibility of the writer. If the writer is saying that when they built RPC systems in the past, which they apparently have, that they treated remote and local as the same then the writer is a muppet. I don't think the writer is a muppet therefore I have to conclude that he is pushing an agenda and is making the facts fit that agenda.
The next bit is a really cracker though. RPC systems lack the ability to do intermediation which is defined as
caching, filtering, monitoring, logging, and handling fan-in and fan-out scenarios
The suggestion in the text around is that its impossible to build a large scale system using RPC as it lacks these abilities. Now apart from the
Business Service Bus stuff I've talked about or the
MCI approach that Anne Thomas Manes advocates this must come as a big surprise the to the companies doing ESBs, Web Service monitoring or Web Service gateways which in fact allow all of these elements in just such the RPC environment that the writer claims isn't possible. Nothing in RPC could ever indicate whether a result is cacheable for instance. Not sure how Jini's leasing fails to do this or how having a field saying "cachetime: x" wouldn't work.
REST of course address all of these concerns (and more) by having...
a field that says you can cache the result something that is trivial to add to an RPC environment but is lauded as hugely different. REST is great because it
doesn't fit with normal programming language abstractions.
Want to bet?
Lets say we have a set of resources, Articles, Writers, Comments. Articles have content and links to writers and comments. Each comment has a link to other comments (sub-comments) and its writer.
Now lets say I decide to hide all this from a developer, how hard would it be? The answer is of course it would be trivial.
Article:
content
writer: Writer(link)
foreach(comment in comments):
commentList.add(Comment(comment.link))
Comment:
content
writer: Writer(link)
foreach(comment in comments):
commentList.add(Comment(comment.link))
Writer:
name
foreach(article in articles):
articleList.add(article.link)
foreach(comment in comments):
commentList.add(Comment(comment.link))
Now when I retrieve an Article I process the XML, for ever link I find to a comment or writer I create an object passing the link to the constructor, this then loads the object. I could do this based on a config file which generates the classes automatically. Hell its just JAXB for Atom. So in fact you can make the REST approach fit perfectly. This works for GET and PUT (every time you update a field just do a PUT) and for POST? Well if we have a POST to /newarticle then we have a factory which creates an article etc etc etc POST a comment on an article? how about Article.addComment(content, user)? Have it return the URI and your away
However we can hide it even more dreadfully by doing it all dynamically at runtime. Every property field can be retrieved
on request by performing a GET and an XPath. In otherwords we can hide the network so well that
every single property request results in a network request.
The next bit is that of course the URI can hide whether a call is local or remote. Something like Google Gears for instance could be used to hide that but you could go further and have a protocol of local:// which indicates an object call, so if you say local://comment/
then it parses the local object for the given comment id. Thus the URI itself can be used to subvert the local/remote visibility and indeed to subvert the use of HTTP with its caching.
The statement is made that the "hypermedia constraint" will prevent this sort of approach. Now I'm clearly missing something because I can obviously map all of the links, dynamically at runtime if required or statically via generation, I can clearly map all of the possible requests (again dynamically if required) and I can provide this via a standard programming approach in an OO language. In Java a dynamic proxy approach could work while in dynamic languages its just about getting the property name asked for and then creating the entity. So I can do a static generation (JAXB style) for a given point in time or have a massive overhead at runtime, all of which I can hide from the developer. Further more the caching is set by the server side which means that the developer has no clue whether a request on a retrieved object will result in a local fetch of cached information or a traversal across the network. These look identical as its hidden in the HTTP handling.
So REST can be made to fit in a "normal" programming language (unless someone can show me an example that couldn't fit) so again this isn't a real argument its just prejudice and an example of the lack of sophistication in current REST frameworks and tools.
The writer admits that he used to "push" RPC (vendors and drug dealers... spot the difference) but doesn't admit that people have made RPC work for distributed systems. The writer then wishes that everyone used async as the standard approach. Again this misses the point that for most people async is a mine of issues where they will bugger things up rather than it representing any improvement. The theme of wishing that everyone in IT was as smart as the writer is fine, but its not something on which to build the next generation of IT. As far as I can see the more people who come into IT the lower the average IQ in IT seems to go.
He then campaigns for Erlang with some features that look a lot like Ada's task management approach, which was brilliant and readable which puts it at least one ahead of Erlang. Hell lets go for Occam, it was great for multi-threading.
The next section is just wishing that everyone had done something differently. Well I wish that people weren't focused on character saving over legibility and how much better it would be as a result, but I haven't got my wish and its a beer discussion rather than a serious one.
The final bit is wishing that everyone would agree with the writer so everything will just be better. Because in his words people who don't learn from history are doomed to repeat it. It this article however the writer has appeared to have not learnt from history as he has chosen to selectively ignore facts that undermine his case and make statements that do not stand up to scrutiny. So in summary
- No decent architect built an RPC system which assumed local = remote
- SOAP systems in particular have done RPC with intermediation
- You can hide the network in REST systems
- Wishing that everyone is as smart as you is not the way to improve the lot of IT
- Development language and framework is tertiary to design and architecture in distribution
- Large scale distributed RPC systems exist and work today
- Distributed computing is about the architecture not about the coding
- The transactions of the world reside on systems that use RPC
This article certainly does appear to be a case of just
Convenience over Correctness problem the writer rails against.
Hopefully the commenter is now happy that I've taken the time to breakdown in detail the issues I had with the article.
Technorati Tags: SOA, Service Architecture