-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove DCSS #4053
Remove DCSS #4053
Conversation
b8911ca
to
de0b797
Compare
kotlinx-coroutines-core/jsAndWasmShared/src/internal/LinkedList.kt
Outdated
Show resolved
Hide resolved
kotlinx-coroutines-core/concurrent/src/internal/LockFreeLinkedList.kt
Outdated
Show resolved
Hide resolved
de0b797
to
6b5cc95
Compare
Initially, I thought that 1a) also had new non-linearizable behavior: when Let's look at the linearizability points of
|
There's a lot of new changes here.
|
Investigating the test failures. |
e170b68
to
a3f5532
Compare
dab2195
to
3f27c6f
Compare
3f27c6f
to
4858fc5
Compare
This change is mostly a refactoring, except now, an arbitrary `onCancelling` handler that's not a child will not add itself in a `synchronized` block. Instead, only the root cause is read under a lock.
Before this change, when children were prohibited from adding themselves to the list, cancellation handlers were also prohibited from doing so. This was plain incorrect, because a list that's closed for new children could still get cancelled later, and also because being closed for new children happened before advancing the state to the final one.
`rootCause` is atomic anyway.
The failure went like this: * A child arrives. * In the meantime, the parent enters `tryMakeCompletingSlowPath` and remembers the current list of handlers, which is an empty or a single-element one. * The parent updates the state to the finishing one. * The child enters the list. * The parent traverses *an old list*, the one from before the child arrived. It sees no children in the empty/single-element list and forgets about the child. Why, then, was it that this worked before? It was because there was a guarantee that no new children are going to be registered if three conditions are true: * The state of the `JobSupport` is a list, * The root cause of the error is set to something, * And the state is already "completing". `tryMakeCompletingSlowPath` sets the state to completing, and because it updates the state inside `synchronized`, there was a guarantee that the child would see either the old state (and, if it adds itself successfully, then `tryMakeCompletingSlowPath` will retry) or the complete new one, with `isCompleting` and the error set to something. So, there could be no case when a child entered a *list*, but this list was something different from what `tryMakeCompletingSlowPath` stores in its state.
4858fc5
to
494d8ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
I'm yet to internalize a few things and revisit some tests, but in general it looks ready.
I also checked a few our benchmarks and everything seems smooth
kotlinx-coroutines-core/concurrent/src/internal/LockFreeLinkedList.kt
Outdated
Show resolved
Hide resolved
…List.kt Co-authored-by: Vsevolod Tolstopyatov <qwwdfsad@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terrific work!
The internal implementation of `JobSupport` no longer uses the Double-Compare Single-Swap algorithm. Instead, the signal for the list to stop accepting this or that kind of elements is provided explicitly. In addition to simplifying the implementation somewhat, this change allowed us to more precisely define when child nodes should stop being accepted into the list, fixing a bug. Fixes #3893 Additionally, new stress tests are added to ensure the correct behavior.
This removes the DCSS, in exchange for giving up some linearizability.
Below is an overview of the implementation of the changes.
completed
.cancelling
or, if that doesn't happen,completed
. They haveonCancelling = true
.cancelling
andcompleted
. These areChildHandleNode
instances.NodeList
; notEmpty
, not a single node. Let's limit ourselves to this special case.NodeList
, we define a new operation,close
, that forbids some of the incoming elements from entering. Which elements are forbidden is passed as a parameter.close(completion)
closes the list completely. No new state changes can happen after completion, and so no new handlers are accepted.close(completion)
happens directly before completion notifications start to be sent out.close(cancellation)
closes the list for cancellation handlers. Children and completion handlers are still allowed to register their callbacks, as the state is not complete.close(cancellation)
happens after a job is cancelled, just before notifications about cancellation are sent out.close(children)
closes the list for children specifically. This happens after we notice that all children already in the list are completed; after that, we double-check if some new children managed to enter the list between when we saw that there were no non-completed children and when we actually closed the list. If there were new children after all, we proceed even with the list closed.addLastAtomic
usage (the one with less indentation), case 2 is the other one. First, here are the non-child cases:state is Incomplete && !onCancelling
. If the state changed from this to some otherIncomplete
state during insertion, interrupting it, we'd just have to restart the insertion. If the state changed to a non-Incomplete
state, with the DCSS approach, this would mean that the handler can no longer be inserted into the list. With the new approach, there'd still be a time window between the state changing and the list closing, but the behavior is still linearizable, because there is no way to know from outside theJob
that a new completion handler was installed even after the state was closed. IfinvokeOnCompletion
happens-after the moment when we know that the state has changed to the final one, theninvokeOnCompletion
will fulfill its contract and run the computation inline, and no other thread could have observed the intermediate state of the completion handler not having been installed despiteinvokeOnCompletion
winning the race with completion.state is Incomplete && state !is Finishing && handler !is ChildHandleNode && onCancelling
. We want to add the handler to the list so that it is notified about any sort ofcompleted
, includingcancelled
. In the worst case, in the meantime, the state could change toFinishing
and even almost becomecompleted
without closing the list; then, this case transforms into 2a. Otherwise, the list was closed for class 2 (or even everyone), and in both cases, the handler should be rejected.state is Finishing && onCancelling && rootCause == null && handler !is ChildHandleNode
. This means it's not yet time to cancel anything, and we can safely add the node to the list. In the meantime, the state could change to the final one, but this just means that the handler will be called soon-ish. This specific behavior is linearizable: since non-child completion handlers are not exposed via the API, there's no way to observe from the outside that the completion handler didn't manage to register before the state finalized: all that matters is that the handler does get executed in the end.Now, for how the children are handled. Because the child-related customization of
invokeOnCompletion
was too intrusive,attachChild
no longer even callsinvokeOnCompletion
, instead keeping its own copy of the code directly in the body ofattachChild
.state is Incomplete && state !is Finishing && handler is ChildHandleNode
. In this case, we're interested in registering the child forcancelling
andcompleted
. Essentially, we try adding it to the list if it's not closed for children.NonDisposableHandle
and, if the job completed with an exception, also invoke the handler with that exception.cancelling
permission, and only try adding it normally if that fails. If adding a child in time to be notified about cancellation fails but adding the child overall succeeds, we know that the job is cancelled: we witnessed a direct evidence of that.state is Finishing && onCancelling && handler is ChildHandleNode && !state.isCompleting
. In this case, we'd like to both put the child in the list and immediately notify it about exceptions (if there are any). With DCSS, if the state changed to be the final one, adding the element to the list would fail. We preserve this behavior by ensuring that the children are prohibited from entering the list even before the final state is reached, namely at the moment when children are queried.In case 1c, the following non-linearizable behavior is possible:
This non-linearizable behavior is arguably not worse than what was there before this PR: see #3893, which this PR fixes.