This is the study notes from the book Release It! Design and Deploy Production-Ready Software.

AVAILABILITY is one of the most frequently used term to describe the reliability of the system. In this section, we will dive deep into the hard of availability. We will discuss how availability is measured and how it is achieved in modern internet services.

First of all, we must realize that there is a tension between the availability and the cost:

  • The desire for greater availability.
  • The desire for minimizing cost.

Every improvement on the availability means an increase cost. But in the mean time, every bit availability improvement also means saved revenue. When you decide the availability of your system, please be clear about the availability level you want to achieve. You want to gathering availability requirements:

  • Availability time can be translated to the system downtime
  • The system downtime improvement means the reduced revenue loss
  • How much revenue loss the business can tolerate.
  • How much cost it takes to reduce the lose.

Next, you want to documenting availability requirements. One common mistake during documenting availabilities are define the system SLAs as a whole, without realizing that the system might have many features that each one carries with a different availability number. Therefore:

  • Define the SLAs in terms of specific features or functions of the system. Don’t define them vaguely based on the system.
  • Be aware of your dependent systems: you can?t afford a better SLA than the worst of the external dependencies involved in a feature.
  • Make sure you can define how you want to measure the availability: how do you know the feature or function is available. And you’d better equipped with some auto availability detection devices

How the high availability is achieve? One major technical is about load balancing. Load balancing is to distribute request across a pool or farm of servers to serve all requests correctly. Horizontally scalable systems achieve both availability and scalability through multiplicity. Adding more machines to increase capacity simultaneous improves resiliency to impulses. And the small servers can be added incrementally as needed, which is way more cost effective.

There are many approaches for load balancing.

DNS Round-robin

DNS solution provides a service name to IP address look up. By mapping the service to multiple IP address and return them in a round robin matter, we can achieve load balancing through DNS. DNS load balancing is often used for small to medium business websites. However, this approach have some security issues: front end IP addresses are visible and reachable from clients, which means they can be attacked. And DNS has no information about the health of the web server, it can send the request to a dead service not.

Reverse Proxy

Let’s first clarify what is a proxy server: it multiplex many outgoing calls into one single source IP address. While the reverse proxy does the opposite: demultiplex calls incoming into a single IP address and fans them to multiples address. Reverse proxy acts as an interceptor for every request. In the current reverse proxy implementation, another feature it to cache the static content to reduce the load on the web server. And since reverse proxy is in the middle of every request, can track which origin severs are healthy and responsive.

Hardware Load Balancer

Hardware load balancer is similar to a reverse proxy server. It can provide switching at layers 4 through 7 of the OSI stack. But it is way more expensive than other software options.


Clustering is different from a service pool. The servers within the cluster are aware of each other and actively participating in distributing load. There are two types of clusters for scaling in general:

  • Active/active cluster: for load balancing.
  • Active/passive: used for redundancy in the case of failure. One handles all the load until it fails and the passive one takes over and become active.

Load balanced clusters do not scale linearly, as clusters incur communication overhead such as heart beat. Within the cluster, the application most likely can coordinates its own availability and failover by having a master control node.