15.4. Nonblocking AlgorithmsLock-based algorithms are at risk for a number of liveness failures. If a thread holding a lock is delayed due to blocking I/O, page fault, or other delay, it is possible that no thread will make progress. An algorithm is called nonblocking if failure or suspension of any thread cannot cause failure or suspension of another thread; an algorithm is called lock-free if, at each step, some thread can make progress. Algorithms that use CAS exclusively for coordination between threads can, if constructed correctly, be both nonblocking and lock-free. An uncontended CAS always succeeds, and if multiple threads contend for a CAS, one always wins and therefore makes progress. Nonblocking algorithms are also immune to deadlock or priority inversion (though they can exhibit starvation or livelock because they can involve repeated retries). We've seen one nonblocking algorithm so far: CasCounter. Good nonblocking algorithms are known for many common data structures, including stacks, queues, priority queues, and hash tablesthough designing new ones is a task best left to experts. 15.4.1. A Nonblocking StackNonblocking algorithms are considerably more complicated than their lock-based equivalents. The key to creating nonblocking algorithms is figuring out how to limit the scope of atomic changes to a single variable while maintaining data consistency. In linked collection classes such as queues, you can sometimes get away with expressing state transformations as changes to individual links and using an AtomicReference to represent each link that must be updated atomically. Stacks are the simplest linked data structure: each element refers to only one other element and each element is referred to by only one object reference. ConcurrentStack in Listing 15.6 shows how to construct a stack using atomic references. The stack is a linked list of Node elements, rooted at top, each of which contains a value and a link to the next element. The push method prepares a new link node whose next field refers to the current top of the stack, and then uses CAS to try to install it on the top of the stack. If the same node is still on the top of the stack as when we started, the CAS succeeds; if the top node has changed (because another thread has added or removed elements since we started), the CAS fails and push updates the new node based on the current stack state and tries again. In either case, the stack is still in a consistent state after the CAS. CasCounter and ConcurrentStack illustrate characteristics of all nonblocking algorithms: some work is done speculatively and may have to be redone. In ConcurrentStack, when we construct the Node representing the new element, we are hoping that the value of the next reference will still be correct by the time it is installed on the stack, but are prepared to retry in the event of contention. Nonblocking algorithms like ConcurrentStack derive their thread safety from the fact that, like locking, compareAndSet provides both atomicity and visibility guarantees. When a thread changes the state of the stack, it does so with a compareAndSet, which has the memory effects of a volatile write. When a thread examines the stack, it does so by calling get on the same AtomicReference, which has the memory effects of a volatile read. So any changes made by one thread are safely published to any other thread that examines the state of the list. And the list is modified with a compareAndSet that atomically either updates the top reference or fails if it detects interference from another thread. 15.4.2. A Nonblocking Linked ListThe two nonblocking algorithms we've seen so far, the counter and the stack, illustrate the basic pattern of using CAS to update a value speculatively, retrying if the update fails. The trick to building nonblocking algorithms is to limit the scope of atomic changes to a single variable. With counters this is trivial, and with a stack it is straightforward enough, but for more complicated data structures such as queues, hash tables, or trees, it can get a lot trickier. A linked queue is more complicated than a stack because it must support fast access to both the head and the tail. To do this, it maintains separate head and tail pointers. Two pointers refer to the node at the tail: the next pointer of the current last element, and the tail pointer. To insert a new element successfully, both of these pointers must be updatedatomically. At first glance, this cannot be done with atomic variables; separate CAS operations are required to update the two pointers, and if the first succeeds but the second one fails the queue is left in an inconsistent state. And, even if both operations succeed, another thread could try to access the queue between the first and the second. Building a nonblocking algorithm for a linked queue requires a plan for both these situations. Listing 15.6. Nonblocking Stack Using Treiber's Algorithm (Treiber, 1986).
We need several tricks to develop this plan. The first is to ensure that the data structure is always in a consistent state, even in the middle of an multi-step update. That way, if thread A is in the middle of a update when thread B arrives on the scene, B can tell that an operation has been partially completed and knows not to try immediately to apply its own update. Then B can wait (by repeatedly examining the queue state) until A finishes, so that the two don't get in each other's way. While this trick by itself would suffice to let threads "take turns" accessing the data structure without corrupting it, if one thread failed in the middle of an update, no thread would be able to access the queue at all. To make the algorithm nonblocking, we must ensure that the failure of a thread does not prevent other threads from making progress. Thus, the second trick is to make sure that if B arrives to find the data structure in the middle of an update by A, enough information is already embodied in the data structure for B to finish the update for A. If B "helps" A by finishing A's operation, B can proceed with its own operation without waiting for A. When A gets around to finishing its operation, it will find that B already did the job for it. LinkedQueue in Listing 15.7 shows the insertion portion of the Michael-Scott nonblocking linked-queue algorithm (Michael and Scott, 1996), which is used by ConcurrentLinkedQueue. As in many queue algorithms, an empty queue consists of a "sentinel" or "dummy" node, and the head and tail pointers are initialized to refer to the sentinel. The tail pointer always refers to the sentinel (if the queue is empty), the last element in the queue, or (in the case that an operation is in mid-update) the second-to-last element. Figure 15.3 illustrates a queue with two elements in the normal, or quiescent, state. Figure 15.3. Queue with Two Elements in Quiescent State.
Inserting a new element involves updating two pointers. The first links the new node to the end of the list by updating the next pointer of the current last element; the second swings the tail pointer around to point to the new last element. Between these two operations, the queue is in the intermediate state, shown in Figure 15.4. After the second update, the queue is again in the quiescent state, shown in Figure 15.5. Figure 15.4. Queue in Intermediate State During Insertion.
Figure 15.5. Queue Again in Quiescent State After Insertion is Complete.
The key observation that enables both of the required tricks is that if the queue is in the quiescent state, the next field of the link node pointed to by tail is null, and if it is in the intermediate state, tail.next is non-null. So any thread can immediately tell the state of the queue by examining tail.next. Further, if the queue is in the intermediate state, it can be restored to the quiescent state by advancing the tail pointer forward one node, finishing the operation for whichever thread is in the middle of inserting an element.[7]
LinkedQueue.put first checks to see if the queue is in the intermediate state before attempting to insert a new element (step A). If it is, then some other thread is already in the process of inserting an element (between its steps C and D). Rather than wait for that thread to finish, the current thread helps it by finishing the operation for it, advancing the tail pointer (step B). It then repeats this check in case another thread has started inserting a new element, advancing the tail pointer until it finds the queue in the quiescent state so it can begin its own insertion. The CAS at step C, which links the new node at the tail of the queue, could fail if two threads try to insert an element at the same time. In that case, no harm is done: no changes have been made, and the current thread can just reload the tail pointer and try again. Once C succeeds, the insertion is considered to have taken effect; the second CAS (step D) is considered "cleanup", since it can be performed either by the inserting thread or by any other thread. If D fails, the inserting thread returns anyway rather than retrying the CAS, because no retry is neededanother thread has already finished the job in its step B! This works because before any thread tries to link a new node into the queue, it first checks to see if the queue needs cleaning up by checking if tail.next is non-null. If it is, it advances the tail pointer first (perhaps multiple times) until the queue is in the quiescent state. Listing 15.7. Insertion in the Michael-Scott Nonblocking Queue Algorithm (Michael and Scott, 1996).
15.4.3. Atomic Field UpdatersListing 15.7 illustrates the algorithm used by ConcurrentLinkedQueue, but the actual implementation is a bit different. Instead of representing each Node with an atomic reference, ConcurrentLinkedQueue uses an ordinary volatile reference and updates it through the reflection-based AtomicReferenceFieldUpdater, as shown in Listing 15.8. Listing 15.8. Using Atomic Field Updaters in ConcurrentLinkedQueue.
The atomic field updater classes (available in Integer, Long, and Reference versions) represent a reflection-based "view" of an existing volatile field so that CAS can be used on existing volatile fields. The updater classes have no constructors; to create one, you call the newUpdater factory method, specifying the class and field name. The field updater classes are not tied to a specific instance; one can be used to update the target field for any instance of the target class. The atomicity guarantees for the updater classes are weaker than for the regular atomic classes because you cannot guarantee that the underlying fields will not be modified directlythe compareAndSet and arithmetic methods guarantee atomicity only with respect to other threads using the atomic field updater methods. In ConcurrentLinkedQueue, updates to the next field of a Node are applied using the compareAndSet method of nextUpdater. This somewhat circuitous approach is used entirely for performance reasons. For frequently allocated, short-lived objects like queue link nodes, eliminating the creation of an AtomicReference for each Node is significant enough to reduce the cost of insertion operations. However, in nearly all situations, ordinary atomic variables perform just finein only a few cases will the atomic field updaters be needed. (The atomic field updaters are also useful when you want to perform atomic updates while preserving the serialized form of an existing class.) 15.4.4. The ABA ProblemThe ABA problem is an anomaly that can arise from the naive use of compare-and-swap in algorithms where nodes can be recycled (primarily in environments without garbage collection). A CAS effectively asks "Is the value of V still A?", and proceeds with the update if so. In most situations, including the examples presented in this chapter, this is entirely sufficient. However, sometimes we really want to ask "Has the value of V changed since I last observed it to be A?" For some algorithms, changing V from A to B and then back to A still counts as a change that requires us to retry some algorithmic step. This ABA problem can arise in algorithms that do their own memory management for link node objects. In this case, that the head of a list still refers to a previously observed node is not enough to imply that the contents of the list have not changed. If you cannot avoid the ABA problem by letting the garbage collector manage link nodes for you, there is still a relatively simple solution: instead of updating the value of a reference, update a pair of values, a reference and a version number. Even if the value changes from A to B and back to A, the version numbers will be different. AtomicStampedReference (and its cousin AtomicMarkableReference) provide atomic conditional update on a pair of variables. AtomicStampedReference updates an object reference-integer pair, allowing "versioned" references that are immune[8] to the ABA problem. Similarly, AtomicMarkableReference updates an object reference-boolean pair that is used by some algorithms to let a node remain in a list while being marked as deleted.[9]
|