Thursday, March 26, 2015

The confusing CAP and ACID wording

CAP and ACID share a common vocabulary: Atomic, Consistent, and so on. But there is a catch: the words are the same but they mean totally different things. CAP comes from the distributed systems theory, while ACID belongs to database systems one.The  Distributed Databases use both CAP and ACID vocabulary, so this obviously creates a lot of confusion. When someone says: “one should not give up consistency” what’s the consistency he is speaking about? Let’s have a look at the definitions of Atomic-Consistent-Isolated-Durable and Consistent-Available-Partition-tolerant.

ACID & CAP - A reminder

ACID properties were identified in the 70’s, then the term ACID was coined in 1983. We have:
  • A for Atomicity
  • C for Consistency
  • I for for Isolation
  • D for Durability


Reminder: the ACID properties were defined in
California during the 70’s.


The CAP theorem was conjectured by Eric Brewer in 2000, and proved in 2002 by Seth Gilbert and Nancy Lynch. We have:
  • C for Consistency
  • A for Availability
  • P for Partition-tolerance.


CAP and ACID wording - the confusion summarized

First, let’s see the global picture.


Word
Databases
CAP
Confusion
Transaction
A set of operations.
The word and the concept are not used.
No
Durable
Once a transaction completes successfully, its changes to the state survive failures.” [D2]
The word and the concept are not used.
No
Consistent
Integrity constraints on the data (data types, relations, …)
For CAP, Consistency is a shortcut for “Atomic Consistency”. The atomic consistency is a consistency model. More on this later.
Same word, different concepts
Isolation
Even though transactions execute concurrently, it appears to each Transaction, T, that others executed either before either after T”. [D2]
The word is not used in CAP but the word Isolation as defined in ACID is a consistency model in CAP vocabulary.
Different words but same concept
Atomic
All changes happen or none happen.
For CAP, Atomic is a consistency model, the one used in the CAP proof.
Same word, different concepts
Available
Concept not often used. If so, the definition can be different than in CAP, i.e. available may not require all the non-failing nodes to respond.
“every request received by a non-failing node in the system must result in a response” [C2]
Same word, same concept, different definitions
Partition
Concept not often used. If so, the definition is the same as in CAP.
Two sets of nodes are partitioned when all messages between them are lost.
No


Now, let’s dig into some details: we’re going to find out that there are some extra confusion sources for distributed databases.


Transaction (only in ACID)

A transaction is a set of operations. Any of these operations can read or write multiple data. ACID is about giving to this set of operations the same properties as if it was a unique operation. That’s not an objective for CAP. CAP is about the possible properties of multiple operations using the same data, possibly replicated.


Durability (only in ACID)

Once a transaction completes successfully, its changes to the state survive failures” is quite clear, but leaves the failure description to the physical deployment. This one depends mostly on redundancy: multiple disks on a single node and/or multiple nodes and/or multiple sites. “Survive” does not imply any notion of availability: it means that it will be at least possible to restore the data later.


CAP itself does not mention durability. The durability in CAP is implicit: CAP is about partitions, not node failures.


Available-in-CAP

In CAP, being available means all non-failing nodes continue to serve requests if the system is partitioned. Many distributed systems will consider themselves as available if, when there is a partition, some non-failing nodes continue to serve requests. These systems are not available-in-CAP.


Consistency-in-CAP and Atomic-in-CAP

Consistency in CAP is a shortcut for Atomic Consistency. Atomic consistency is a consistency model. And a consistency model describes how the operations of a system can be ordered. The list of operations depends on the system. For example, it’s possible to define the consistency model of a transactional system, hence saying that ‘commit’ is one of the operations. CAP proof is done on a distributed shared memory model defined by Lynch in [V6] and uses read/write/ack.


The choice of a consistency model is far from trivial. There are many consistency models because there are many possible trade-offs between:
  • How easy it is to use the consistency model. It may also depend on the application itself : some consistency models may be easier to use for some applications than for others.
  • How efficient is the implementation of the memory model. This may also depend on the hardware and the physical deployment in general.


The consistency models used in ACID and CAP are actually simple:
  • Sequential consistency, as defined by Lamport [V9]: “the program behaves as if the memory accesses of all processes were interleaved and then executed sequentially.
  • Atomic Consistency (also called linearizability) is sequential plus a real-time constraint: “Unlike sequential consistency, linearizability implicitly assumes the notion of an observable global time across all processes. Operations are modeled by an interval which consists of the period of time between the invocation and response for the operation and each operation is assumed to take effect instantaneously at some point within this interval.” [V7]


CAP says “Consistent” for Atomic Consistency: it’s just a shortcut. Atomic-in-CAP (ordering of operations) and Atomic-in-ACID (all or nothing) are not the same thing at all.


Consistency-in-ACID

Consistency in ACID relates to data integrity. For example, it is often possible to implement these rules in a SQL database:
  • this field is not null
  • this field is a number
  • this field is an existing reference to another field in another table.


The database will not let you commit a transaction that breaks a constraint. That’s the consistency-in-ACID contract. This consistency definition has no equivalent in CAP.


It’s important to remember that a database, SQL or not, does not implement all the consistency constraints. Let’s quote Gray and Reuter: [D2] “the system underneath has no means for checking all the consistency constraints there are. Most of them are not formalized in the first place.


And to conclude on Consistency-in-ACID, let’s quote again Gray and Reuter [D2] “It is important to keep in mind that the consistency definition that comes with the transaction paradigm is largely a syntactic one.


Atomic-in-ACID

The “all-or-nothing” behavior is quite intuitive: for example, with such a transaction, simulating a transfer between two accounts in pseudo-code:


begin
   val1 = read(account1)
   val2 = read(account2)
   newVal1 = val1 - 100
   newVal2 = val2 + 100
   write(account1, newVal1)
   write(account2, newVal2)
commit


Atomic means that either the two accounts will be updated or none of them. If the write to account1 succeeds then the write to account2 fails, the two writes will be rollbacked.


However, being atomic-in-ACID does not mean that this transaction is isolated from the others. In other words, you can claim to be atomic-in-ACID even if:
  • the values written are visible to others before the transaction is actually committed.
  • the values read can be modified by others during the transaction. If you read the same value multiple times you may get different results.


For example, with the transaction proposed above, you can expect this behavior with many SQL databases:
  • start with 1000 on account1, 0 on account2
  • run two transfers in parallel (the code above)
  • while you’re expecting 800 on account1 and 200 on account2, you get 900 on account1.


This because Atomicity is different than isolation in ACID: Atomic/all-or-nothing does not mean isolated.


Isolation

The specification of isolation given by Gray and Reuter in [D2] is: “Even though transactions execute concurrently, it appears to each Transaction, T, that others executed either before either after T”. This defines a consistency model, like the consistency of CAP. With this definition, any transaction is fully isolated. It’s easy to understand and to use.


Isolation in theory: the developer has nothing
to do thanks to the serializability property


However, this was difficult to implement efficiently, and databases had to relax the constraints. As a consequence, isolation in an ACID database comes in several degrees, namely “serializable”, “repeatable read”, “read committed”, and “read uncommitted”.


The one most commonly used by default is “read committed”: a transaction sees only data that has been committed.


While this looks simple, there are a few catches:
  1. These definitions were made with an implementation in mind. As Hellerstein, Stonebraker and Hamilton said [D1] : “Both rely in subtle ways on an assumption that a locking scheme is used for concurrency control, as opposed to an optimistic or multi-version concurrency scheme. This implies that the proposed semantics are ill-defined.
  2. For a given isolation degree, all the databases do not have the same behavior. This is explained by Martin Kleppmann in [D4].
  3. The isolation impacts not only functional correctness, but the technical correctness as well: most databases use locks behind the scene. Concurrent executions can create deadlocks: unexpected or unmanaged dependencies between independent transactions create situations in which all transactions need another transaction to free their own resources. In such a situation, one of the transactions is stopped by the database engine and fails, even if this transaction is functionality and syntactically correct.


At the end of the day, database users need to understand far more than the database consistency model: they need to understand how it is actually implemented by the database engine they are using.


Let’s look at our example above again. If used on a database with “read committed” (a common default), we may have an incorrect result. We may actually generate money. This can be fixed by locking the value explicitly. The pseudo-code for a transaction in “read committed” mode for a database engine using locks could be:


begin
   val1 = readAndLock(account1)
   val2 = readAndLock(account2)
   newVal1 = val1 - 100
   newVal2 = val2 + 100
   write(account1, newVal1)
   write(account2, newVal2)
commit // release all locks


With this comes a lot of the complexity already met in concurrent programming languages like Java. Databases actually add an extra level of complexity as the lock range can be wider (pages, table, …) and can be changed dynamically (escalated) by the database engine. Moreover, deadlocks or performance requirements can lead to use “read uncommitted” for some transactions that are supposed to read only immutable data. This can lead to complex issues if someone, somewhere (let’s say, a professional service expert on a customer’s site) modifies this theoretically immutable data while the system is running.


Isolation in practice, the database role being played by the gnu.


We see that the ‘C’ of CAP and the ‘I’ of ACID are very similar. But CAP being a theorem can stick to the model, while performance constraints forced databases to add multiple levels of parameterization, and forced the database users to understand how the isolation is actually implemented.


Conclusion
Of the 4 letters in the ACID acronym, 3 of them have a different meaning than their equivalent in CAP. No wonder this is confusing. Also, the confusion does not only come from the overlaps in the wording, but in the necessity to go into the implementation details to understand the difference in a real-life concurrent application. In practice, this is one of the difficulties that had led many people (me included) to look at the ‘NoSQL’ world.


See here the Hacker News discussion on this post.





References

[C2] Gilbert and Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News (2002)
[C6] Michael Stonebraker, Clarifications on the CAP Theorem and Data-Related Errors, 2010
[D1] J. M. Hellerstein, M. Stonebraker, J. Hamilton, Architecture of a Database System, Foundations and Trends in Databases Vol. 1, No. 2 (2007) 141–259
[D2] J. Gray and A. Reuter, Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993.
[D4] M. Kleppmann, “Hermitage: Testing the “I” in ACID”, blog post, 2014
[V6] Nancy Lynch, Distributed Algorithms, Morgan Kaufmann, 1996
[V7] Kourosh Gharachorloo, “Memory Consistency Models for Shared-Memory Multiprocessors,” Stanford Technical Report, 1995
[V8] M. Herlihy, J. Wing “Linearizability: A Correctness Condition for Concurrent Objects”, ACM, 1990
[V9]Leslie Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs", IEEE Trans. Comput. C-28,9 (Sept. 1979), 690-691.

Wednesday, March 18, 2015

Don't use the CAP theorem for packet losses

In the previous post, we looked at this common saying: nodes fail, network packets get lost, partitions happen so you need to use CAP to understand your trade-offs.” We saw that node failures were not partitions. What about packet losses? Most distributed applications use TCP or UDP on top of IP, and it is well known that IP is an asynchronous protocol and that it can lose packets. So should we use all the results from the theory of asynchronous networks? Must we use CAP to do some trade-offs if we are using a network that can drop packets?


The answer is no. The root issue lies in the incompleteness of our description of IP. “IP is an asynchronous protocol and it can lose packets” is true, but incomplete, and this incompleteness is misleading. Let’s discuss why.


CAP - The usual reminder

CAP says that a distributed system cannot be Consistent, Available and Partition tolerant.


We use here the definitions from the proof [C2].
Consistent is[C2] “Atomic, linearizable, consistency [...]. There must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time.


Available is: [C2]For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response.


Partition is: [C2]The network will be allowed to lose arbitrarily many messages sent from one node to another. When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost.


How can you lose packets on a network

Packet loss happens when a network equipment receives more messages that it can send.
Losing packets is common on an IP network. A TCP connection tries to use most of the bandwidth available. It sends nearly as many packets as it can, until it loses some. Losing packets is taken as a network congestion indicator by the TCP connection. If this happens the connection slows down quickly, then tries to send more packets again.


So, on an IP network used with TCP, you will lose packets unless all the applications cannot, at any time, send and consume packets fast enough to use all the available bandwidth. Even on a LAN, with mixed workloads and large analytic jobs, applications can quickly reach the congestion point.  So, obviously, we need to take care of packet losses.


Losses do happen


How TCP manages packet losses

TCP is no magic trick: the receiver must send acknowledgments for the data received. The sender of a packet keeps it in a buffer until it has received acknowledgement from the receiver. If the sender does not receive acknowledgement after a calculated delay, it considers the packet as lost and sends it again. The calculation of this delay is complex and has evolved over time.


What are message losses in the CAP proof

CAP is quite clearly based on a network that can lose messages [C2]: “The network will be allowed to lose arbitrarily many messages sent from one node to another.” So even when there is no partition we can lose messages.


Asynchronous network model

TCP as described above is a partially asynchronous network: each node has a local clock. It is this clock that allows it to declare that some packets are lost. A purely asynchronous network does not have this local clock and cannot have such a timeout mechanism.


There are several implications. On an asynchronous network that can lose messages:
  • It is not possible to be available and consistent and partition tolerant. That’s CAP.
  • It is not possible to be available: as any message can be lost, it is not possible to guarantee that all the requests will arrive, and, even if they do, that their answer won’t be lost as well.


So, the asynchronous model does not allow to draw any valuable conclusion when dealing with availability and messages loss: nearly everything is impossible. It is not a big issue, because CAP is proved for asynchronous networks, but, also, for partially asynchronous networks. This is why nobody has “beaten CAP”.


Partially asynchronous network

In the network model used by CAP, a clock is added to each node. This allows a very simplified model close to TCP: messages are sent again if there is no acknowledgement after a certain delay. In other words, it builds a network that does not lose messages on top of a network that can lose them, exactly as TCP does. CAP still holds on such networks, because on a partition all messages between partitions are lost, so sending the messages again does not help.


More on the asynchronous model

The asynchronous model has many advantages:
  • It is simple to understand.
  • Proofs are likely to be simpler to formalize and describe under this model than under a partially asynchronous. Validation by peers is also easier.
  • Quoting Nancy Lynch [V6]: “The asynchronous model assumes less about time than is actually guaranteed by typical distributed systems. Thus, algorithms designed for the asynchronous model are general and portable, in that they are guaranteed to run correctly in networks with arbitrary timing behavior.


However, the fact that there is an impossibility result on an asynchronous model does not mean that it is impossible in reality. For example, in 1985, Fischer, Lynch and Paterson [V2] proved the impossibility of distributed consensus with a faulty process on the asynchronous model. Their conclusion was rather modest, especially considering how important the result was: “These results do not show that such problems cannot be “solved” in practice; rather, they point up the need for more refined models of distributed computing that better reflect realistic assumptions about processor and communication timings.”


Difference between model and reality: these two tigers are white, but
one eats much more meat than the other


Conclusion

Networks do lose packets. But TCP allows to hide packet losses to the application, and that’s what the CAP proof actually does as well. In any case, packet loss is irrelevant when using the CAP theorem: CAP is about partition, not node failure or packet loss. While it is always possible to make the right decision despite basing it on flawed logic, it is generally better to use a theorem only when it can be applied to the problem: dead nodes and packet losses are bad, but they do not force you to make a choice between availability and consistency.




References
[C2] Gilbert and Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News (2002)
[V2] M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. 1985
[V6] Nancy Lynch, Distributed Algorithms, Kaufmann Management Systems, 1996