Remove suspicious wait set locking code #119

rotu · 2020-03-20T00:16:05Z

Since inuse is not volatile, it cannot be used instead of a lock.

eboasson · 2020-03-20T07:42:15Z

The way I read it, inuse is only ever used with CddsWaitset::lock held, so in that sense, I don't see a problem. However, merely accessing inuse correctly is not the same thing as it being used correctly, and there you may well have a point.

It serves a dual purpose: the first is to prevent concurrent calls to wait on a single wait set. The DDS standard forbids this (even if Cyclone doesn't) and so surely layers above RMW can't be using the same waitset from multiple threads at the same time. That test is therefore merely a sanity check.

Preventing this is useful because it guarantees that no-one is mucking about with the waitset while it is blocked in rmw_wait, allowing one to attach the entities in a very specific order and caching ranks in this order. This way, on return from dds_waitset_wait, it can relatively efficiently null out any entries that did not trigger. (The rmw_wait interface really is a disastrously bad one — and one that the Unix select call also suffers from and that has prompted umpteen different interface to try and avoid it …)

Secondly, there is the caching of attached entries — the assumption that the set of subscriptions, clients, services, guard conditions and events will usually not change from call to call, and that for realistic use-cases, it is cheaper to do compare arrays on entry than it is to attach/detach each entry every call. That's based on an experiment I once did (must've been a microbenchmark, like a hacked talker/listener pair). I would be good to re-do that check: the price paid in complexity is not insignificant.

The worst part of the complexity is having some rather delicate dependencies between entities, waitsets and behaviour of rcl. That's also the source of the "I'm assuming one is not allowed to delete an entity while it is still being used ..." comment in clean_waitset_caches, which states an unverified assumption on the allowed behaviours of rcl. It may be a reasonable assumption given that between rcl and rmw there's nothing but pointers, but that doesn't necessarily mean it is true.

The change you propose has an issue of itself as well: by extending the scope of CddsWaitset::lock across the call to dds_waitset_wait, it will make clean_waitset_caches block on calls to rmw_wait, and so delay the deleting of subscriptions, &c. That could be worked around by adding a guard condition for causing a spurious wakeup (and so on and so forth), but that's trading one piece of horror for another.

In short, I'm not quite ready to accept your PR. But if you'd have the ability to do a bit of measuring the cost of doing attach/detach for each call vs this caching strategy, that'd be great. Then we can either throw it away or we'll at least have decent evidence of the cost.

rotu · 2020-03-20T13:36:33Z

The issue I see is that the lock is only held for a short time, so it doesn’t prevent threads from using the same waitset. And I think the compiler can optimize out the write to inuse in a way that other threads never see it.

~~Will modify~~ now modified to prevent deadlock

rotu · 2020-03-20T15:23:57Z

I want to make sure I understand your aside "(The rmw_wait interface really is a disastrously bad one — and one that the Unix select call also suffers from and that has prompted umpteen different interface to try and avoid it …)".

The idea behind waitsets seems to be "I'm interested in when one of these conditions becomes true. I don't care which one" and having to iterate through (and god forbid poll) all the events is expensive. This is compared to an event queue, where, when a condition becomes true, that change becomes immediately visible with minimal work to see what has changed. I assume this is what "attaching" a condition to the waitset with dds_waitset_attach is basically doing - constructing that event queue.

And the specific horror of the rmw_wait function is that there's no way to tell, besides looking at all its arguments, whether a waitset has changed from one call to the next. Having an API that passes a waitset object and informs the rmw layer whenever it needs to change that waitset would be more to your liking. Actually, it looks like it should be able to do this in theory with the current API, except that Executor::wait_for_work empties and re-populates the waitset each time, even if it is actually unchanged.

eboasson · 2020-03-20T17:01:51Z

Yes, although I wouldn't call it an event queue: you wait until any of the attached conditions become true, and the idea is that the set of conditions you want to wait changes only rarely. On return, you get a list of all conditions that triggered. Typically it's very few of them, but there can be multiple.

The spec'd DDS API then returns the objects that triggered; Cyclone DDS instead returns an associated intptr_t to be used as one sees fit. Object handles in Cyclone are positive int32_ts, so can trivially be put in, but one can also use numbers that mean something, or even pointers. The idea is that you can, e.g., write a loop:

struct closure { ... };
{ struct closure *c = malloc(sizeof(*c)); c->func = mycallback; c->arg = myarg; }
dds_waitset_attach(ws, reader, (intptr_t) c);
nxs = dds_waitset_wait(ws, xs, lengthof(xs), timeout);
for (int32_t i = 0; i < nxs; i++) {
  struct closure *c = (struct closure *) xs[i];
  c->func (c->arg);
}

but this is definitely something that is not directly supported by the spec'd DDS API.

And all this matches exactly with what the upper layers of rcl are trying to achieve ... create a waitset, add some objects to it with callbacks you want to invoke when there's some work to do, then invoke them.

But instead of building on a sensible primitive (even the one in the DDS API is sensible, all it takes is a small map to locate the callback given the object pointer), it constructs that waitset object, deconstructs it into arrays of pointers to objects, passes those to rmw_wait for it to erase the pointers that did not trigger, then scans the arrays for the non-null entries, &c.

The RMW layers therefore have to assume the set of pointers can change from call to call; and one further has to assume that memory may be reused (hence the need for clean_waitset_caches).

Literally all of it could have been skipped ...

A good cleaning up would be valuable, but that amounts to a pretty massive rewrite of the ROS2 core executor code, so I guess it's not going to happen. Which leaves as the only realistic option extending the RMW interface to inform the lower layer of the attach/detach events.

eboasson · 2020-03-25T15:03:04Z

rmw_cyclonedds_cpp/src/rmw_node.cpp

@@ -2221,8 +2221,10 @@ static void clean_waitset_caches()
     used ... */
  std::lock_guard<std::mutex> lock(gcdds.lock);
  for (auto && ws : gcdds.waitsets) {
-    std::lock_guard<std::mutex> lock2(ws->lock);
-    waitset_detach(ws);
+    std::unique_lock<std::mutex> lock2(ws->lock, std::try_to_lock);


"try lock" is generally a tricky game: all a failure to lock proves is that the lock was held at the time of trying. By the time you start executing the next instruction it may already have been released. And I would argue that this gives you essentially the same behaviour as the original code.

In the original code there is the issue that the schedule:
T0: rmw_wait / set busy / do attach / wait
T1: delete X / clean_waitset_caches / skip waitset_detach
T0: wakeup / erase slots for non-triggering entities / clear busy
ends up with the (rmw) waitset entry cache containing a dangling pointer a deleted rmw entity [*]. What will happen most of the time is that a subsequent call to rmw_wait will notice a change in the set of pointers (because it does a memcmp, the fact that it is dangling is not really an issue, and by accident more than design, the events will squeeze through without dereferencing a dangling pointer, too).

The interesting bit is when we then continue as follows:

T1: create X', and it happens to get the same address as X

T0: rmw_wait
because the cache hasn't been invalidated and X is at the same address as X' (and presumably may moreover be in the same position in the arguments to rmw_wait) the cache checking logic can end up concluding that the cache is correct for the set being waited for, skipping the attach calls, and therefore not waiting for X' to trigger.

Your change to unconditionally call waitset_detach, blocking until a concurrent call to rmw_wait woke up, would fix that problem. But with this change, you're back to the original behaviour ... and thus get back the original problem.

But there may be a saving grace: it may (as my old comment noted) well be that ROS2 will never delete an entity while another thread is using it in rmw_wait. In that case, all this is a moot point. I think it is likely that this is the case, because at the RMW layer they're all simply pointers, not, e.g., weak references. (Cyclone doesn't make such assumptions in its API, but it exposes 31-bit random numbers as handles on its API instead of pointers. It doesn't guarantee anything about entity handle reuse, but it does use a decent PRNG and with the typical application having a few thousands handles at most, the likelihood of aliasing is pretty small.)

The other thing is that (as I mentioned before) the entire caching code exists because of a microbenchmark I once did. Perhaps it isn't worth it at all. If one simply rips it out, the problem disappears along with it.

(What would also work, and what I would consider doing if the caching is worth keeping, is to retain the busy flag, and depending on whether it is clear or set, either invalidate the cache or mark it as invalid/cleanup-deferred. And then, in rmw_wait, always do the "require reattach" path if it was invalid/cleanup-deferred.)

[*] It has been detached from the DDS waitset, that bit is not an issue.

I think there is not a mistake with how I'm locking here, and the code is correct as-is.

unique_lock and lock_guard are both raii objects. That is they continue to hold the lock until the object is destroyed. In the case of clean_waitset_caches, that's one iteration of the loop (no worry that the lock might be snatched from us during waitset_detach) In the case of rmw_wait, that's until the function returns. Again, no worry that someone else might acquire the lock.

I do think I can make this code much better by redoing how we cache, but that's a bigger PR.

eboasson

There is, like I tried to explain in my comment, a detail in how this interacts with a concurrent call to wait because of which it doesn't actually improve over what's currently in master. This not that it accesses things without holding a lock or anything, it is rather about the consequence of not calling waitset_detach when the waitset is being waited on. Other than the bad smell of a busy flag that's the only thing really wrong with the current code — and so the PR as it stands really just replaces that bad small by the smell of holding a lock across a wait operation.

In my view, there are two options to move forward:

Investigate whether there is any possibility of clean_waitset_caches running into a locked waitset where it actually needs to do any work. That function is called only when subscriptions or guard conditions get deleted and waitsets are only locked that function and rmw_wait. Ergo, it can ignore any locked waitsets if it is certain that the subscription or guard condition being deleted is currently in use in any waitset. If this is certain, you're effectively replacing the busy flag with a trylock with no downsides. I can live with that.
Investigate whether there is sufficient reason to do this caching in the first place. If there isn't, then I propose we throw the code being discussed away entirely. That would certainly remove a ton of code smell.

Remove suspicious wait set locking code

d05fced

Since inuse is not volatile, it cannot be used instead of a lock.

If a waitset can't be immediately cleaned up, skip it instead of waiting

5f0023a

eboasson reviewed Mar 25, 2020

View reviewed changes

use braces for constructor

deccdf7

rotu force-pushed the patch-10 branch from b0895df to deccdf7 Compare March 25, 2020 17:33

rotu requested a review from eboasson March 29, 2020 17:51

eboasson reviewed Mar 30, 2020

View reviewed changes

eboasson mentioned this pull request Apr 9, 2020

Switch to one participant per context model #145

Merged

audrow changed the base branch from master to rolling June 28, 2022 14:22

clalancette added the backlog label Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove suspicious wait set locking code #119

Remove suspicious wait set locking code #119

rotu commented Mar 20, 2020

eboasson commented Mar 20, 2020

rotu commented Mar 20, 2020 •

edited

rotu commented Mar 20, 2020 •

edited

eboasson commented Mar 20, 2020

eboasson Mar 25, 2020

rotu Mar 25, 2020 •

edited

eboasson left a comment

Remove suspicious wait set locking code #119

Are you sure you want to change the base?

Remove suspicious wait set locking code #119

Conversation

rotu commented Mar 20, 2020

eboasson commented Mar 20, 2020

rotu commented Mar 20, 2020 • edited

rotu commented Mar 20, 2020 • edited

eboasson commented Mar 20, 2020

eboasson Mar 25, 2020

Choose a reason for hiding this comment

rotu Mar 25, 2020 • edited

Choose a reason for hiding this comment

eboasson left a comment

Choose a reason for hiding this comment

rotu commented Mar 20, 2020 •

edited

rotu commented Mar 20, 2020 •

edited

rotu Mar 25, 2020 •

edited