Entity enable #530

TheFixer · 2020-05-29T19:42:56Z

This PR implements delayed enabling of participants, publishers, subscribers, readers and writers using dds_enable() call. Delayed enabling allows qos changes to entities that before they are enabled, and that would result in DDS_RETCODE_IMMUTABLE_POLICY after the entity has been enabled.

In particular, this PR contains the following

The entity_factory.autoenable_created_entities QoS is implemented for participants, publishers and subscribers. This QoS indicates whether child entities for these must be automatically enabled (default) or disabled when created.
Implementation of the API call dds_enable() to enable entities that have been created disabled. This operation is a noop when applied to an already enabled entity
Delayed creation of readers and writers in ddsi so that disabled readers and writers are not discovered
A bunch of tests cases to validate the implementation of this feature

The following issues are still present:

Since there is no factory for participants in cyclone that has the autoenable qos it is currently impossible to create a disabled participant. Ofcourse this functionality can be built, but I just did not do it (yet)
And perhaps worse, there is one situation were I still encounter problems. This has to do with deleting a participant while there are disabled entities, this gives a timeout. There are test cases added to demonstrate this problem.

eboasson · 2020-06-12T14:20:06Z

@TheFixer I'm sorry it is taking so long to review. It hasn't slipped through the cracks, it is just that it needs and deserves a careful look.

TheFixer · 2020-06-12T15:44:01Z

No problem, @eboasson . I am still struggling a bit with properly cleaning up entities that have not been enabled yet. I suggest I keep struggling for a while while you spent your valuable time on more urgent issues.

TheFixer · 2020-10-28T13:11:51Z

@eboasson This PR implements delayed enabling of entities. A more detailed description of what has been done can be found in TheFixer@18f39fc. Would it be possible to review this PR?

TheFixer · 2021-02-04T13:37:01Z

It has been a few months ago that I spent a considerable amount of time to attempt to implement the dds_enable() and get it running on the (then current) master. So far I have not received any comments. Therefore, I am about to close this pull request.

eboasson · 2021-02-05T10:53:56Z

@TheFixer I am very sorry that it got ignored for so long. I don't know why I missed this:

@eboasson This PR implements delayed enabling of entities. A more detailed description of what has been done can be found in TheFixer/cyclonedds@18f39fc. Would it be possible to review this PR?

but it sure hasn't helped that it got lost somewhere in either my mailbox or my memory. As to the relevance of the PR, it is a rather important bit considering that content filtering is something that should happen at the reader level (rather than via "content filtered topics") and that the "enable" functionality is a key part of making it possible to set a content a filter on a reader before it starts receiving data. (The alternative would be setting it via a QoS, but that has some other downsides, at least at this time.)

So allow me to reopen the PR and make a priority of doing the review it deserves. We'll see how large the impact is of the changes that have been merged into master since then, but as they don't involve major changes to the basics of creating or deleting entities, I have good hopes it won't be too bad.

And next time, if you don't hear anything and are wondering what's going on, please don't wait so long and just ping me!

Signed-off-by: TheFixer <thefixer@iteazz.com> This pull request implements delayed enabling of entities, in particular of publishers, subscribers, readers and writers. Delayed enabling allows immutable qos changes to be changed as long as the entity is not enabled, but once the entity is enabled immutable qos changes are not allowed anymore. Delayed enabling is implemented by deferring the creation of ddsi entities as long as the entity is not enabled. The application can call dds_enable() on a disabled entity to enable it. Calling dds_enable() on an already enabled entity is considered a noop for the application. This pull request implements the following behaviour: - Participants are always enabled. - Created entities will be automatically enabled when their parent entity is enabled, and their parent has autoenable set to true. This is the default. - Calling dds_create_writer() on a disabled publisher which has autoenable=true will return a dds writer without a corresponding ddsi writer (and similar for readers) - Disabled entities can be enabled by calling dds_enable() on the entity. When the parent of the entity is not enabled, enabling will fail and return with DDS_RETCODE_PRECONDITION_NOT_MET. When the entity cannot be enabled because of qos mismatches or invalid security credentials, an error code will be returned. The call dds_enable() will return DDS_RETCODE_OK if is has enabled the entity and alls its disabled children that have been created while their parent has autoenable=true. If not all children are successfully enabled then dds_enable() will return with the error code of the first child that that could not be enabled. - Calling dds_enable() on an already enabled entity is a noop. - The m_iid is pre-allocated when a dds_entity is created, and is reused when the corresponding ddsi entity gets enabled. - Immutable QoS settings can be changed on an entity as long as the entity is not enabled. As a consequence the creation of some qos dependent structures (like whc) have been moved from the dds entity to the ddsi entity. - The whc is created when entity is enabled, not prior to that. The same holds for the xpack creation. - When readers that have no predefined rhc, the rhc will be created when the reader gets enabled (i.e., the call to dds_rhc_default_new() is done when the entity is enabled) - Readers that do have a predefined rhc will use this rhc when becoming enabled. - thread_state_awake() and thread_state_sleep() functions have moved to the location where entities are enabled. - The check for the availability of the security credentials has been moved to location where entities are enabled. This implies that a disabled dds writer or reader can be created before checking its security credentials. To not change legacy behaviour the creation of a dds reader or writer fails with DDS_RETCODE_NOT_ALLOWED_BY_SECURITY when the reader or writer is automatically enabled when created. Note that this can only happen for writers whose parent is enabled and have autoenabled set to true. - If an entity (e.g., subscriber) is enabled which causes recursive calls to dds_enable() on multiple children (e.g., readers), and some of these child entities are successfully enabled and others are not (e.g., because the some of the readers have the correct security credentials and others have not) then the dds_enable() on the parent will fail. It is the responsibility of the application to find out which of the entities have been enabled, and which have not. - When a reader is created with an implicit subscriber, then the implicit subscriber is enabled if and only if the autoenable setting of the participant is set to true. Similarly for the creation of a writer with an implicit publisher. - Various tests have been implemented. Notes for the reviewer. - If a participant has autoenable=false and an reader (or writer) with an implicit subscriber (or publisher) is created, I don't think it is possible to enable the implicit subscriber (or publisher)( anymore. I therefore think it would be wise to prevent the creation of a reader (or writer) with an implicit subscriber (or publisher) when the participant has autoenable=false, and return DDS_RETCODE_PRECONDITION_NOT_MET in that case. Do you agree? - Currently, I don't use locking when recursively enabling children. Please check. - I am not sure, but I have the impression that some security tests take much time. Not sure if it is caused by this pull request.

eboasson · 2021-02-08T10:22:59Z

@TheFixer the set of conflicts appeared small enough to make it worthwhile rebasing it before doing the review (the advantage is that I have keep track of fewer versions in my head), and so I did just that. I've made my rebased version available at https://github.com/eboasson/cyclonedds/tree/thefixer-entity-enable. The tests still pass locally.

A quick look at the changes I made suggests that it won't affect the review in a major way, so I can give review comments on the current version of this PR without complicating things. However, I think it would make the most sense to update the PR first, provided you agree with the way I resolved the conflicts in the rebased version.

TheFixer · 2021-02-09T07:17:25Z

Thanks @eboasson for taking the effort. I'll have a look at the rebase but don't expect any issues. I let you know when the updated pull request is available.

TheFixer · 2021-02-10T15:03:19Z

@eboasson I inspected the changes caused by the rebase and did not see any problems. The PR is now updated.

eboasson · 2021-02-10T18:56:38Z

@TheFixer thanks, that makes doing the review a bit easier. (And leaves me no excuse for procrastination!)

eboasson

@TheFixer as was to be expected, the review results in some comments. Some are really minor matters, some are more substantial.

What I haven't checked is whether indeed all functions that should fail if the entity they are invoked isn't enabled yet correctly return that error. I'll do that in a second round, I suppose.

Regarding the tests, I cannot help but wonder whether it would not be better to collect all "enable" tests in a single file and structure them a bit differently because a large overlap in the set of checks for the different entity kinds, and a bit of generic code along with some small functions to construct entities would likely result in significantly less code covering exactly same the ground.

Finally, I think that especially for the reader/writer tests it would make sense to also check that the built-in topics come out as they should (generated at the correct time, correct instance handles, stuff like that) and that remote nodes indeed observe the discovery events as a consequence of enabling the entity. (The latter is pretty obvious looking at the code, but the former not so much.)

eboasson · 2021-02-18T11:27:27Z

src/core/ddsc/include/dds/dds.h

+ * The dds_enable() operation will return DDS_RETCODE_OK if an entity and all its
+ * autoenabled children have been enabled successfully. In case one or more of
+ * its autoenabled children could not be enabled, an error code is returned.
+ * It is the responsibility of the application to find out which of the children


I'm not really a fan of this model — I'd rather see no change at all when the operation fails — but considering that it is a relatively rarely used operation and that I imagine failure will usually be treated as a grounds for program termination, I think this is fine for a first iteration.

eboasson · 2021-02-18T12:03:12Z

src/core/ddsc/src/dds_builtin.c

@@ -202,6 +202,23 @@ static struct ddsi_tkmap_instance *dds__builtin_get_tkmap_entry (const struct dd
  return tk;
 }

+static struct ddsi_tkmap_instance *dds__builtin_get_tkmap_entry_with_iid (const struct ddsi_guid *guid, const uint64_t iid, void *vdomain)


This function serves a rather unique purpose: new local entities now need to have an iid before being enabled, and in the way it is implemented in this PR before the GUID is assigned. In all other cases the iid gets assigned on mapping the key to an instance id, and for entities that GUID is the key.

Technically, the possibility exists that a GUID gets reused, there are after all but 2^24 entity ids (in Cyclone, arguably different entity kinds could use the same entity id). So a process that creates 2^24 entities (but for some rounding errors), deletes one of them (so there is again an entity id available), and then creates another entity will necessarily reuse the entity id of the entity that was just deleted. However the GUID-to-iid mapping may outlive the entity that was deleted.

Consequently, in the design in this PR, come the call dds__builtin_get_tkmap_entry_with_iid on the creation of the second entity with GUID guid, if the old GUID-to-iid mapping for that GUID still exists, you end up in a state where the mapping retains the old iid, whereas the entity object has the new iid. That'll lead to an observable difference (somewhere, I can't be bothered to figure out all the details). I am sure you'll agree this is an edge case, but edge cases are usually more interesting than regular ones 🙂.

Methinks the solution lies in not introducing this new path, but rather moving the GUID allocation (e.g., in new_reader) up in the stack (e.g., to dds_create_reader, or perhaps even to dds_entity_init) so that the GUID-to-iid mapping can always follow the same path and the potential discrepancy avoided.

(This has various effects throughout the PR, I'm not going to mark them, that's not worth the bother.)

eboasson · 2021-02-18T12:14:52Z

src/core/ddsc/src/dds_entity.c

-    e->m_flags |= DDS_ENTITY_ENABLED;
-    DDS_CERROR (&e->m_domain->gv.logconfig, "Delayed entity enabling is not supported\n");
-  }
+    rc = dds_entity_deriver_enable(e);


This does make some sense (enabling is definitely specific to the entity kind), but it doesn't actually dispatch to an implementation for all entity kinds (e.g., what about topics? — and if those don't need any special handling, dds_entity_deriver_dummy_enable simply returns "illegal operation", so is a rather incomplete operation).

eboasson · 2021-02-18T12:22:11Z

src/core/ddsc/src/dds_entity.c

+  /* enable all disabled children with the flag DDS_ENTITY_ENABLE_ON_PARENT */
+  for (c = ddsrt_avl_iter_first (&dds_entity_children_td, &e->m_children, &it); c != NULL; c = ddsrt_avl_iter_next (&it)) {
+    if ((!(c->m_flags & DDS_ENTITY_ENABLED)) && (c->m_flags & DDS_ENTITY_ENABLE_ON_PARENT)) {
+      ret = dds_enable((dds_entity_t)c->m_hdllink.hdl);


This is invoked by the entity kind-specific function on e while e's entity lock is held, but that means it locks the children's entity locks while holding the parent's entity lock, which breaks the rules. Perhaps it would have been better to allow locking a child entity while holding the parent's lock, rather than the other way around, but today you can only easily walk "upward" in the hierarchy.

So here is a risk of deadlock, however small (there are rather few operations that would actually try to lock a disabled entity, but certainly calling dds_enable on a publisher and on its writers in parallel is one way.)

The trick to do such a walk over entities typically involves unlocking & relocking the parent entity. See, for example, pushdown_pubsub_qos.

eboasson · 2021-02-18T12:26:39Z

src/core/ddsc/src/dds_entity.c

+      e->m_flags |= DDS_ENTITY_ENABLE_ON_PARENT;
+    /* do not enable if parent is not enabled or no autoenable */
+    if ((!(parent->m_flags & DDS_ENTITY_ENABLED)) || ((e->m_flags & DDS_ENTITY_ENABLE_ON_PARENT) == 0))
+      e->m_flags &= ~DDS_ENTITY_ENABLED;


I think rewriting the logic to set ENABLED once if and only if it really is enabled is much clearer than setting it and clearing it if turns out that it shouldn't be set.

eboasson · 2021-02-18T13:01:26Z

src/core/ddsc/src/dds_reader.c

-  thread_state_asleep (lookup_thread_state ());
-#endif
+  (void) dds_delete (reader);
+  goto del_implicit_sub;


I'm not against goto for error handling, but this moves just a little too far in the direction of "unstructured" programming to my taste!

(Also for the writer.)

eboasson · 2021-02-18T13:02:51Z

src/core/ddsc/src/dds_publisher.c

+   * returns PRECONDITION_NOT_MET */
+  if ((pub->m_entity.m_parent != NULL) && !dds_entity_is_enabled(pub->m_entity.m_parent))
+    return DDS_RETCODE_PRECONDITION_NOT_MET;
+  pub->m_entity.m_flags |= DDS_ENTITY_ENABLED;


These tests, setting the ENABLED flag and propagating to children, are a fundamental part of enabling entities. So in my view, these particular parts of enabling an entity should be part of the generic dds_enable, and not be part of the entity kind-specific functions.

eboasson · 2021-02-18T13:04:41Z

src/core/ddsc/src/dds_writer.c

-  /* FIXME: not freeing WHC here because it is owned by the DDSI entity */
+
+  /* not freeing WHC here because it is owned by the DDSI entity */
+#if 0


I don't see any value in leaving this #if 0/#endif block in.

eboasson · 2021-02-18T13:12:07Z

src/security/core/tests/common/test_utils.c

@@ -501,6 +501,9 @@ void rd_wr_init_fail(
      goto fail;
    dds_set_status_mask (*rd, DDS_SUBSCRIPTION_MATCHED_STATUS);
  }
+  dds_delete_qos (qos);


These three lines don't add anything of value, it means the same code is executed under all circumstances. I prefer the simpler variant.

eboasson · 2021-02-18T13:14:57Z

src/security/core/tests/secure_communication.c

@@ -448,7 +448,7 @@ CU_TheoryDataPoints(ddssec_secure_communication, multiple_readers) = {
    CU_DataPoints(size_t, 1, 3, 1, 3), /* number of participants per domain */
    CU_DataPoints(size_t, 3, 1, 3, 3), /* number of readers per participant */
 };
-CU_Theory((size_t n_dom, size_t n_pp, size_t n_rd), ddssec_secure_communication, multiple_readers, .timeout = 90, .disabled = false)
+CU_Theory((size_t n_dom, size_t n_pp, size_t n_rd), ddssec_secure_communication, multiple_readers, .timeout = 120, .disabled = false)


I don't understand why implementing dds_enable would affect the duration of this test. I'd prefer not to increase the timeout if there is no need, but I'm open to the possibility that 90s is just a bit short.

TheFixer · 2021-02-19T07:27:52Z

@eboasson Thanks for the review. I'll rework the PR and let you know when ready.

TheFixer force-pushed the entity-enable branch from a57637a to 289bc9a Compare May 29, 2020 20:01

TheFixer force-pushed the entity-enable branch from 78a41fc to 0a34a1c Compare October 5, 2020 14:28

TheFixer force-pushed the entity-enable branch from 6a4e77a to 18f39fc Compare October 28, 2020 09:57

TheFixer closed this Feb 4, 2021

eboasson reopened this Feb 5, 2021

TheFixer force-pushed the entity-enable branch from 18f39fc to 7d55316 Compare February 10, 2021 14:45

eboasson reviewed Feb 18, 2021

View reviewed changes

poetinger added the needs-triage Contributer attention required label Jun 15, 2022

poetinger assigned TheFixer Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entity enable #530

Entity enable #530

TheFixer commented May 29, 2020 •

edited

eboasson commented Jun 12, 2020

TheFixer commented Jun 12, 2020

TheFixer commented Oct 28, 2020

TheFixer commented Feb 4, 2021

eboasson commented Feb 5, 2021

eboasson commented Feb 8, 2021

TheFixer commented Feb 9, 2021

TheFixer commented Feb 10, 2021

eboasson commented Feb 10, 2021

eboasson left a comment

eboasson Feb 18, 2021

eboasson Feb 18, 2021

eboasson Feb 18, 2021

eboasson Feb 18, 2021

eboasson Feb 18, 2021

eboasson Feb 18, 2021

eboasson Feb 18, 2021

eboasson Feb 18, 2021

eboasson Feb 18, 2021

eboasson Feb 18, 2021

TheFixer commented Feb 19, 2021 •

edited

Entity enable #530

Are you sure you want to change the base?

Entity enable #530

Conversation

TheFixer commented May 29, 2020 • edited

eboasson commented Jun 12, 2020

TheFixer commented Jun 12, 2020

TheFixer commented Oct 28, 2020

TheFixer commented Feb 4, 2021

eboasson commented Feb 5, 2021

eboasson commented Feb 8, 2021

TheFixer commented Feb 9, 2021

TheFixer commented Feb 10, 2021

eboasson commented Feb 10, 2021

eboasson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheFixer commented Feb 19, 2021 • edited

TheFixer commented May 29, 2020 •

edited

TheFixer commented Feb 19, 2021 •

edited