Tuesday, May 12, 2015

Eventual Consistency and Durability

“A son can bear with equanimity the loss of his father, but the loss of his data may drive him to despair.” (Machiavelli, quoted from memory)


While traditional databases are ACID, with the ‘D’ meaning Durability, the NoSQL databases are mainly described by their memory model, i.e. Strong Consistency vs. Eventual Consistency. We're going to see that durability is not something you can forget about so easily.

Definition of Eventually Consistent

Eventual Consistency was defined by Werner Vogels in [E2] "the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value."


It’s one definition among many. Baillis and Ghodsi mention for example [E3] Informally, it guarantees that, if no additional updates are made to a given data item, all reads to that item will eventually return the same value.” This definition only says that the value will converge to something (say ‘42’). In 1995, Terry used [V5] All servers move towards eventual consistency. That is, the Bayou system guarantees that all servers eventually receive all Writes.”.


As most implementations available today are inspired by the Dynamo paper and Vogels’ works, it’s safer to stick to Vogels’ definition, who knew them all, and has chosen his words carefully. In the CAP series, I explained the importance of this definition.


And finally, this consistency is not the same thing as the consistency in ACID.

Using an eventually consistent database

So if we have an eventually consistent key-value database, we can implement a client that would be doing something like this for example:


ecStorage.put(“anId”, “someData”);
while (ecStorage.get(“anId”) == null) sleep 1 second;


We know that this program will finish: even if the first “gets” may not see our write, eventually they will see it.

Eventually Consistent database - possible implementation

The most usual implementation is the one described once again by Werner Vogels, once again in his [E2] paper:
N = The number of nodes that store replicas of the data.
W = The number of replicas that need to acknowledge the receipt of the update before the update completes.
R = The number of replicas that are contacted when a data object is accessed through a read operation.


A typical value for N is 3: the data is replicated on 3 nodes asynchronously or not, depending on the value of W.
Values for W and R vary between 1 and 3, depending on what you want in terms of performance and consistency.
If we write with W=1, the initial write goes to a single node. The write is finished, from a client point of view, when a node stores the data. The data is then asynchronously written to the other two nodes.

Eventual Consistency, failures and dataloss

So, if we try our program on a dynamo key-value database, with W=1 and R=1, what happens?
The simplest scenario is, with 3 nodes, N1, N2, N3.


Client does a put on N1
Client does a get on N1
The program finishes.


Another one would be:
Client does a put on N1
Client does a get on N2: receives null
Client does a get on N1
The program finishes.


Or, also:
Client does a put on N1
Client does a get on N2: receives null
In parallel, the data was replicated from N1 to N2
Client does a get on N2: receives the data
The program finishes.


These scenarios are all fine. What if there is a failure? After all, that’s what we’re here for!
We could have, for example:
Client does a put on N1
N1 crashes after the put but before the replication
Client does a get on N2: receives null
Client does a get on N3: receives null
...


Will this end someday? Well, if N1 does not come back to life, no, as N1 failed before the data was replicated. In other words, you wrote an application relying on an eventually consistent storage, but a Dynamo implementation with W=1 is not an eventually consistent storage. More generally, a storage system that can lose data cannot be called an eventually consistent storage system.
Remember: failures do happen. That’s why you’ve chosen
eventual consistency in the first place


So what?

This shows a strong link between consistency and durability. There is no dataloss in a consistency models. It’s different for databases, hence the need for the ‘D’ in ACID. This is especially important in the ‘NoSQL’ / ‘big data’ worlds, as they they are deployed with commodity servers that will fail more often than your usual high-end server.

Is this a real world issue?

Well, hopefully if people spend so much time describing consistency models it’s because it’s useful and used in real life. The idea is that the developer knows the consistency model and can use it when writing applications. As such, the hidden implication of W=1 should be explicit when describing the dynamo model.


This requirement to write synchronously to multiple nodes to get eventual consistency is important. If the application needs to be partition tolerant, it also means that it must write synchronously to multiple racks, thus dramatically increasing the write latency as compared to a single node write.


Between datacenters, synchronous writes become costly. Asynchronous replication will solve the performance part, with the loss of some data and the loss of the eventual consistency if a disaster occurs in one datacenter. Eventual consistency cannot be used to represent the system as a whole. It’s something that Stonebraker already pointed out in [C10]: "Hence, eventual consistency cannot be guaranteed, because a transaction may be completely lost if a disaster occurs at a local cluster before the transaction has been successfully forwarded elsewhere."

This post is part of the consistency series.


References

[E1] G. DeCandia et al., "Dynamo: Amazon's Highly Available Key-Value Store," Proc. 21st ACM SIGOPS Symp. Operating Systems Principles, 2007
[E2]  Werner Vogels, Eventually Consistent, ACM Queue, Vol.6 No.6, 2008
[E3] Peter Bailis, Ali Ghodsi, UC Berkeley, Eventual Consistency Today: Limitations, Extensions, and Beyond, communications of the ACM vol. 56, 2013
[C10] Michael Stonebraker, Errors in Database Systems, Eventual Consistency, and the CAP Theorem, acm blogs, 2010
[V5] Doug Terry and al, "Managing Update Conflicts in Bayou, A Weakly Connected Replicated Storage System", SOSP 1995

No comments:

Post a Comment