Wednesday, May 14, 2008

REST on Mars - scaling the problem to make a point

One of the objections I've had about REST for a while is that it appears to ignore Deutch's fallacies of network computing
  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn't change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogeneous.
Now REST specifies 8, assumes 1, 2 and 3 and takes 4 to mean HTTP/S with Basic Authentication. Now to be clear I've seen people doing Web Services who believe in pretty much all 8 of these fallacies and they create crap systems. But with things like WS-RM and WS-Security at least there are answers to a few elements.

A common push back I've received is that in this day and age that 1 is about the idempotent nature of REST, which is a reasonable point even if it puts effort on the client. The follow up on 2 and 3 however has been that in modern networks this just isn't a problem. So in this day and age just how bad could it be? Well for local work inside a VM obviously no-one would use either REST or WS-* as it would represent a massive overhead so clearly at the very reactive level there are issues. The next question is what about if you have a very limited connection, somewhere outside the developed world or remote parts of modern countries..... or to really stress the point.... Mars.

This Sunday the Phoenix mission will aim to touch down on Mars on board it has lots of sophisticated technology and lots of information to send back to base. To dispatch this information however it has a 128kbps maximum speed comms link.

So lets say we want to get a new 1MB image everytime that the rover takes it and this is being implemented by a bit of a muppet who is thinking about using either WS-* or REST, both of which are the wrong decision. Anyway Mr Muppet looks at the REST approach and structures the resources as follows
  1. GET on Rover to determine other available resource
  2. GET on Camera URI from the rover URI to determine available pictures
  3. Work out if there has been a new picture added (new URI available)
  4. GET on the new URI
  5. Once we've got the image, check that it is okay
  6. PUT on the URI to delete the image
And then the muppet looks at the WS implementation and decides to use callbacks and WS-Notification to say when there is a new image and then do a Web Service call to get the image.
  1. Register with Rover for the callback
  2. Receive callback when there is a new image, this gives you the ID of the image
  3. Call Rover.getImage(ID)
  4. Check the image
  5. Call Rover.deleteImage(ID)
Now it sort of looks like we have the same number of calls, but of course one interface is polling while the other is pub/sub. Lets say the camera takes an image every 6 seconds, this means a good polling interval will be around 6 seconds, if however the images are taken with a wide distribution (from ever millisecond up to once in an hour) then our polling needs to be much finer grain. To be fair though lets say that it takes a fairly efficient 1.5 polls for each image reception.

REST
Network costs
Okay now down to the numbers so per successful image request we have
1.5x GET on ROVER to get the Camera URI
1.5x GET on Camera to get the Image URIs
1 GET on Image URI
1 PUT on Image URI

Now assuming some efficient XML lets say that the ROVER XML is about 200 bytes (limited number of resources) and the Camera URI is about 150 bytes (minimal beyond one image). The image is 1MB and the PUT is a null so just a basic request (lets say 20 bytes and be generous).

So in total we have 300 + 225 + 20 + 1048576 bytes which is.... 1049121 bytes or 8,392,968 bits. Over our 128kbps network this will take 64 seconds of network time. Not too bad..

WS-*
Over in Web Services land Mr Muppet does the registration which is a once off cost of (lets say its heavy on XML) 4048bytes.

Then the ROVER sends the notification (another 4048 bytes)
A WS call to get the Image 1MB + 4048 bytes
A WS call to delete the Image - 4048 bytes

So in this case we have a one off cost of 0.24 seconds

Then for each call we have 1,145,728 bytes, 9,165,824 bits and a cost of 70 seconds, a whole 6 seconds worse than REST.

Then there was latency
Ahh but then we have latency, the earth is a MINIMUM of 56,000,00km away which assuming perfect transmission at light in a vacuum speed (not even theoretically possible) means a latency of 187 seconds.

Now with REST we have 5 calls and on WS-* we have 3 calls. This means that REST takes a total of 999 seconds while WS-* takes 631 seconds. Even if we allow the REST implementation to cache the camera URI (we could have the image sent with the notification on WS-* as well) then its actually the latency that matters in this equation as much if not more than the bandwidth.

My point here isn't to talk about WS-* v REST but to point out that when doing distributed code you shouldn't ignore those 8 fallacies and you shouldn't assume that everything will be fine. You might not have to communicate with Mars but you might well want to deal with partners in the rest of the world where network links aren't as great. Even the giants like Google have latencies close to 100ms so a chatty approach will just cause issues in even 1st world networking environments. I've made the point before about creating scalable XML and Web Services and I think it bears repeating

The network isn't zero latency, it isn't infinite bandwidth and assuming those things is not what makes distributed systems scalable. This is not to say that REST or WS cannot be made to work in a distributed way but it does mean that these non-functional elements should feature in your design as much as obeying strict theoretical constructs.

Technorati Tags: ,

8 comments:

Unknown said...

I still don't get all the hype about exchanging messages over the HTTP protocol. I still tend to favor XML over JMS instead of XML over HTTP, either REST or WS-*. Messaging protocols like JMS are faster, more reliable, secure, transactional and easier to implement.
I prefer to use WS-* only when I want to expose very coarse-grained services to external clients and I don't want to create a dependence with a specific technology.
However looks like complexity is easier to sell than simplicity in this world.

Anonymous said...

"one interface is polling while the other is pub/sub"

Mr Muppet made the mistake of changing two independent variables in his experiment. Now he can't know if the dependent variable (latency) increased as a result of the architecture choice or the technology.

If he had chosen to instead compare something with the same architecture, like comet for example, I suspect he would of found both approaches to be pretty similar. My guess would be that the latency difference between them is insignificant. He would of then been able to conclude that it isn't so much about REST, but optimizing the architectural choices for a given set of constraints.

Steve Jones said...

Gavin,

Comet wouldn't work here because of the challenge (and cost) of keeping the socket connection open. I did a presentation at JavaOne in 2002 which demo'ed an approach much like comet to enable two mobile phones to communicate via a server intermediary.

Anonymous said...

so?

These are all architectural styles / choices and YOU have to decide based upon Your problem. The argument you have given is exactly something I would expect from a REST maniac but not from you Steve.

If you have an application in which only NASA guys are accessing the MARS rover, then yeah maybe REST isn't the best solution. But wait , if you allow EVERYONE to access those images , then have fun with WS-*!! REST would make sure that the images have been cached on EARTH so the latency would be there only on the very first request.

I haven't really made a complete convincing use case example of REST but my point is more about, it is architectural decision and deciding which style to use is what architects are paid for.

Steve Jones said...

My argument anon is just to highlight the fallacy of ignoring network performance when looking at the scalability of a solution.

You are right that once the stuff is shifted to earth then having a Web site (HTTP/REST) that provides access to all of the images from now and the past would be a good idea, in fact it would absolutely be the right way to go.

My point is simply that in distributed systems the network matters and chatty solutions are not often a great idea. If its a person waiting at the other end then you are okay as people are relatively slow, if its a computer then its an issue.

I agree that an architects job should be to choose the right style for the right job. Neither WS-* nor REST fit everywhere, in fact even combined they don't address even standard problems and are limited to specific areas.

Anonymous said...

yeah Network does matter and the I have noticed that mostly it is the WS-* / CORBA people who have tried to push the standard Object methods on to the network, without rethinking the solutions. Reminds me of a beautiful line by Steve Vinoski (he refers to ignoring network as trying to maintain local/remote transperancy)

"The layers of complexity required to maintain the resulting leaky illusion of local/remote transparency are reminiscent of the convoluted equations that pre-copernican astronomers used to explain how the sun and other planets moved around the earth. "

Steve Jones said...

Yup Objects over the wire is another chatty, and therefore crappy, way of doing distribution. Services != Objects != Web Services but one thing Services DO equal is distributed.

Zubin Wadia said...

SteveJ, the point is well taken.

I would guess if NASA was really doing this they would setup the Rover to start pumping data back to Earth as soon as it begins to get incoming content. That just makes sense to me with that kind of latency. Why ask when the Rover can tell? It's not like NASA wouldn't want to know.

Then NASA just has to monitor the repository on Earth for incoming payloads. And IF they want to make a specific inquiry, they can do that.