"Dead nodes don't bite." - Robert Louis Stevenson (quoted from memory)
CAP is often described as a theorem you cannot avoid using. A common saying is “nodes fail, network packets get lost, partitions arise so you need to use CAP to choose the trade-offs.” Actually, the CAP scope is not that wide. Let’s look at why a process crash or a node failure are not partitions in CAP.
CAP - the usual reminder
CAP is a theorem, initially conjectured by Eric Brewer and proved in 2002 by Seth Gilbert and Nancy Lynch proved [C2]. Its catchy definition is: “Consistency, Availability, Partition tolerance: pick two.”
Let’s remind the definitions from the proof [C2].
Consistent is: “Atomic, linearizable, consistency [...]. There must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time.”
Available is: “For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response.”
Partition is: “The network will be allowed to lose arbitrarily many messages sent from one node to another. When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost.”
Node failure vs. partition - the wrong intuition
A common and wrong intuition is: “from the point of view of the remaining nodes, it is impossible to distinguish a node that has failed from a node that is on another partition: in both cases, it does not respond nor sends any message. As a consequence, there is no difference between a network partition and a node failure.”
Let’s try this intuition: first against the definition of CAP, then against its proof, and then against real-life data store implementations.
Is this crocodile dead or is it just sleeping?
There is no difference until you try to stroke it.
Checking vs. the CAP definition
The definition of availability requires “every request received by a non-failing node in the system must result in a response.” So, a node failure is, by definition, outside the scope of the CAP theorem.
That’s simple. But maybe the proof proves more than the definition, and implicitly covers the node failure case? Let’s check.
Checking vs. the CAP proof
The proof is described by Gilbert and Lynch in their paper [C2]. They prove that you cannot be both available and consistent during a partition by exhibiting a scenario that cannot respect all these properties.
The scenario is simple:
Let’s figure a system with two nodes: N1, N2.
step 0: a shared register receives an initial value: SR=0
step 1: N1 is partitioned from N2. By the partition definition, there can be no communication between the two nodes.
step 2: N1 writes a new value for SR: SR=1
step 3: N2 reads SR. It cannot get the latest value written by N1. Hence the system is either non consistent (N2 reads the old value) or not available (the writing by N1 fails or the reading by N2 fails).
This proof relies on having code running on both sides of the partition. If N2 is dead instead of being partitioned, the scenario cannot exist.
The CAP proof relies on having ‘real’ partitions. Hence the CAP theorem cannot be applied if the node is simply dead. Does it have any kind of importance in real life? Let’s check this as well.
Checking vs. the real world
Well, for the real world we just have to copy-paste what Michael Stonebraker said a while ago [C6]: “Now let us turn to single node failures. [...] You simply failover to a replica in a transactionally consistent way. Notably, at least Tandem and Vertica have been doing exactly this for years.”
There is nothing more to add. There are real-world systems that are both available and consistent when there is a node failure.
Back to our intuition
So our intuition was wrong. But why exactly? To answer, let’s go back to this specific sentence: “it is impossible to distinguish a node that has failed from a node that is on another partition.” This is true. The issue is:
- if you want to be available, you will keep running all the nodes on the two partitions. The fact that one partition does not know if the other nodes are dead or partitioned is not a question: they work independently by construction.
- if you want to be strongly consistent, you need to choose one partition as the available one. The nodes on the second one will kill themselves, or at least will not answer to requests. In this case, a partition will be equivalent to a failure, but this equivalence will be obtained by implementing a specific logic in all the nodes (quorums, masters, …).
This is an implication of the CAP theorem: if we want to cover any kind of operation as CAP does, the only solution to be consistent is to kill, logically or not, all the nodes that are not on the right side of the partition. In other words, in a strongly consistent system, the intuition is right, or, more precisely, actions are taken to make it right. However, in an AP system the intuition is just wrong.
A node failure is not a partition, and the CAP theorem says nothing about failures. As such, it should not be used when reasoning about node failures or server crashes.
[C1] Eric A. Brewer, PODC Keynote, July 19, 2000, Towards Robust Distributed Systems
[C2] Gilbert and Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News (2002)
[C6] Michael Stonebraker, Clarifications on the CAP Theorem and Data-Related Errors, 2010