Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resume/callback in cont, async, etc. should return a Boolean #3553

Open
armanbilge opened this issue Apr 24, 2023 · 14 comments · May be fixed by #3917
Open

resume/callback in cont, async, etc. should return a Boolean #3553

armanbilge opened this issue Apr 24, 2023 · 14 comments · May be fixed by #3917

Comments

@armanbilge
Copy link
Member

This Boolean should be similar to the Boolean returned when completing a Deferred i.e. it should indicate that this invocation of the resume/callback is the one and only one that actually completed the callback. Everyone else should get false.

This information about whether the callback succeeded would help avoid leaks in various concurrent state machines. Here are two that I encountered recently:

@durban
Copy link
Contributor

durban commented May 9, 2023

I think this is a good idea. I suspect it would('ve) make fixing #3603 almost trivial. (I assume the semantics would be that if a cancel won, the callback would return false.)

Something worth thinking about is how would this affect Deferred#complete: it already returns a bool, but that means a different thing.

@durban
Copy link
Contributor

durban commented Dec 17, 2023

To answer my own question: I don't think this should affect Deferred#complete. That is a similar, but different thing. With a Deferred, there is 0 or more subscribers, and it is "normal" not to wake up some of them. With async/cont/etc., there is one "subscriber": the fiber itself; and it is important whether we woke it up.

durban added a commit to durban/cats-effect that referenced this issue Dec 17, 2023
@durban
Copy link
Contributor

durban commented Dec 17, 2023

I've started working on this (#3917), and the desired semantics are not entirely clear.

What should resume return (true or false), if resume wins the race? get did not run (probably yet), and resume sets the result. This seems like it should be true, but I think nothing stops the fiber from being cancelled right after this. So it's not really true or false, but "we don't know (yet)". Waiting in resume is obviously not an option. (Also note, that get may not even be sequenced, so it may never run.)

Any ideas?

@durban durban linked a pull request Dec 17, 2023 that will close this issue
@armanbilge
Copy link
Member Author

but I think nothing stops the fiber from being cancelled right after this

Really? I would say that's a bug. If resume completes with a value (and returns true), then get must resume with that value. If you can cancel between these steps, then you can lose values.

But it depends what you mean by "right after this". If you mean steps after the get, that is fine.

@durban
Copy link
Contributor

durban commented Dec 17, 2023

Well, how about something like this:

      new Cont[IO, Int, String] {
        def apply[F[_]](implicit F: Cancelable[F]) = { (resume, get, lift) =>
          lift(IO(resume(Right(42)))) >> F.canceled >> get.map(_.toString)
        }
      }

With the current implementation this self-cancels (I think this is expected). But I think it could be cancelled at any of the >>s also. Which also seems "fine" (if a bit strange).

I think the point is, that currently when resume executes, it may have no way of knowing what will happen with the value it writes. E.g., because get did not run yet, or in fact it may never will run.

@armanbilge
Copy link
Member Author

Ok, thanks, I see your point now.

I think the point is, that currently when resume executes, it may have no way of knowing what will happen with the value it writes.

I think semantically the true means that if get runs, then this will be the value it returns.

@durban
Copy link
Contributor

durban commented Dec 22, 2023

I'm not sure that helps with the "lost value" problem. For example, let's see asyncCheckAttempt (async is of course similar):

      def apply[G[_]](implicit G: MonadCancel[G, Throwable]) = { (resume, get, lift) =>
        G.uncancelable { poll =>
          lift(k(resume)) flatMap {
            case Right(a) => G.pure(a)
            case Left(Some(fin)) => G.onCancel(poll(/*HERE*/get), lift(fin))
            case Left(None) => get
          }
        }
      }

Let's say, that resume runs first, writes the result, gets a true (if get runs, it will return that result). But then the fiber gets cancelled right before get is sequenced (marked with HERE in the code above)... and the result is lost. (But whoever called resume thinks it isn't lost.)

Arguably, this is a "bug" in asyncCheckAttempt. (Well, not in the current version, but in the version where cb returns a Boolean.) I'm still thinking how to fix this bug, but I'm also not sure the bug is really in asyncCheckAttempt...

@armanbilge
Copy link
Member Author

armanbilge commented Dec 22, 2023

Hmm, thanks for writing that out. Actually if I understand correctly, this seems to be a very general problem (not specific to asyncCheckAttempt).

  1. to avoid leaking values, we must be able to guarantee that get is sequenced
  2. to support cancelation, get must be cancelable

Currently there is no way to have a cancelable get that is guaranteed to be sequenced 🤔 I'm not even sure if such a thing is possible, without a tryGet and a combinator like onCancelMasked from #3474 (comment) to "defer" a cancelation signal to avoid leaking values.

@armanbilge
Copy link
Member Author

armanbilge commented Dec 22, 2023

dumb-idea-of-the-day

poll(/*HERE*/get)

what if we just disallowed cancelation there (i.e. change the implementation of uncancelable/poll)?

@durban
Copy link
Contributor

durban commented Dec 22, 2023

this seems to be a very general problem (not specific to asyncCheckAttempt).

Yeah. Although, I suspect that if we can solve it for asyncCheckAttempt, we can probably solve it for others too. The typical usages of cont seem to be quite similar, and asyncCheckAttempt seems to be the most general one.

Also worth noting, that this whole problem only occurs, if resume wins the race. If get runs first and suspends the fiber, that's fine. If the fiber is cancelled before resume runs, I think that's also fine (there is a cancellation check in resume).

what if we just disallowed cancelation there (i.e. change the implementation of uncancelable/poll)?

Yeah, I thought about that, but that has... consequences. If we look at this:

poll(/*1*/ x /*2*/)

The point 2 is already not a cancellation point. If we also make 1 not a cancellation point, that makes poll a no-op in certain cases (e.g., if x is pure, or a single delay, or something like that). This might not be a deal breaker, but I'm not sure.

@armanbilge
Copy link
Member Author

that makes poll a no-op in certain cases (e.g., if x is pure, or a single delay, or something like that). This might not be a deal breaker, but I'm not sure.

Yeah, honestly that seems fine to me 🤷 but I won't claim to foresee all the consequences :)

@djspiewak
Copy link
Member

Let's say, that resume runs first, writes the result, gets a true (if get runs, it will return that result). But then the fiber gets cancelled right before get is sequenced (marked with HERE in the code above)... and the result is lost. (But whoever called resume thinks it isn't lost.)

Bolded the relevant bit here.

So in this scenario, whoever called resume definitely knows the value is lost, because their finalizer is getting called. Consider this from the user side. I'll use async instead since it's a simpler signature:

IO.async(cb => IO(/* things that call cb */ Some(IO(/* finalizer */))))

Okay so if the code at /* finalizer */ runs in the above, we know definitively that any value (or error) which had been passed to cb is lost. At this point, we're responsible for sorting that out, either by recovering it somehow or getting an upstream to retry, etc. Whatever it takes really.

In some cases, this isn't reliably possible without sacrificing other system properties, so instead what often happens is people treat this as a nondeterminism (even though it isn't one) and dodge the recovery by instantiating async to Unit and solely using it as a latching mechanism (see: the new Queue implementations, for example). This avoids having to create weird recovery protocols.

But overall, this is something that users can cleanly respond to, unlike the cancelable gap at the end of poll regions, to which they have no response.

@durban
Copy link
Contributor

durban commented Jan 4, 2024

whoever called resume definitely knows the value is lost, because their finalizer is getting called

That is a very good point. (It would be "nicer", if the return value could give this information directly, but I'm beginning to think that might not be possible.)

Now the question is: could we build something (useful) with both the Boolean and the finalizer, which we couldn't already build with just the finalizer? (Because if not, then this issue is not really needed.)

durban added a commit to durban/cats-effect that referenced this issue Jan 31, 2024
@durban
Copy link
Contributor

durban commented Feb 3, 2024

So, about never cancelling (observing cancellation?) right inside poll, i.e.,

poll(/* we never cancel HERE */ x)

It is possible. In my draft PR (#3917), I did it (not in a very elegant way, as it's just an experiment for now). It seems to work. A few things needed fixing (notably in Resource), but nothing too terrible.

Edit: of course, this is yet another violation of the functor laws I think. The existing one is for poll(x /* HERE */), so it's not surprising.

durban added a commit to durban/cats-effect that referenced this issue Feb 17, 2024
durban added a commit to durban/cats-effect that referenced this issue Feb 23, 2024
durban added a commit to durban/cats-effect that referenced this issue Mar 7, 2024
durban added a commit to durban/cats-effect that referenced this issue Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants