So in the first article on five nines I talked about the challenge of SOA reliability and how the goal of reaching five nines really gets difficult when you are talking about distributed applications. I take on board one of the comments about getting the "right" information being the goal of an operating SOA and its that point that needs to be born in mind. The goal of operation is to deliver a system that returns results which are acceptable to its consumers at that point in time given the operational constraints that exist..
So first off lets start with the different options of solving the problems raised by Deutch's Fallacies. So sure some people will say "well duh!" but it worth stressing that there are different options to solving the basic problem of distributed reliability and those different options are applicable based on the differing business and technical drivers on your service. One size doesn't fit all. If you want a magic bullet, go and talk to a vendor they'll be happy to sell you a nickel plated one.
Distributed high availability is hard, the goal here is to help understand what type of "hard" problem you are facing.
- Plan for failure
- Understand what the "minimum" operating requirement is
- Understand the time critically of information
- Understand the accuracy requirement of information
Taking these one at a time they have some big impacts on how you design, build and support SOA environments. The primary challenge though is to accept a basic truth perfect operation is impossible if you strive for perfection then you will never deliver five nines, the goal of High Availability SOA is to deliver acceptable operation at all times