Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor plugins in an ECS based way #114

Merged
merged 50 commits into from
May 29, 2024
Merged

Conversation

luca-della-vedova
Copy link
Member

@luca-della-vedova luca-della-vedova commented Jan 19, 2024

New feature implementation

Implemented feature

Based on #113, fixes #95, fixes #9.
Needs open-rmf/rmf_traffic_editor#483

Implementation description

This PR ports the codebase to Harmonic, drops Gazebo Classic support and takes the chance to remove the common / gz split, which allows us to cleanup the codebase a fair bit and take advantage of ECS patterns.
Specifically, when using Gazebo classic, we were forced to adopt the same architecture of one plugin per object, leading to large inefficiencies, for example for each door we would have one state publisher + one request subscriber that would check each incoming request.

This PR refactors plugins to be in an ECS based way. Now doors / lifts have a single simple plugin called register_component that adds a component that contains all their properties.
There is now a single site wide door / lift manager system that iterates through all entities with door / lift components and performs whatever logic is needed.

For a more indepth overview of the architecture, there is a FOSSAsia 2023 talk here.

Furthermore, state updates moved from being continuous to being event based, to reduce traffic flow and improve scalability.
Specifically, we use a new feature in ROS 2 Iron, Matched Callbacks, now:

  • Whenever a new subscriber is detected, we send the full state of the system, slightly throttled to avoid data floods (one publish per simulation update cycle).
  • Whenever a change in a door state is detected, we send a message with the new state.
  • When none of the above is true, no messages are sent, reducing load on downstream subscribers.
  • State QoS is now reliable with a greater history depth since now we can't really afford to lose messages.

Both these changes should greatly with scaling RMF to larger sites, since they will cut down both computation time and network traffic.

Bonus tasks:

  • Make lift updates event based as well.
  • Refactor dispenser to be ECS based and change to event based.
  • Refactor ingestor to be ECS based and change to event based.
  • Refactor slotcar to be ECS based.
  • Remove rmf_robot_sim_common package and unify plugins.

luca-della-vedova and others added 25 commits December 19, 2022 18:06
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Luca Della Vedova <luca@openrobotics.org>
Signed-off-by: Arjo Chakravarty <arjoc@google.com>
Signed-off-by: Arjo Chakravarty <arjoc@google.com>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
This reverts commit b2ed3f4.

Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
@luca-della-vedova
Copy link
Member Author

To help speed up development I removed the event based logic in 045ed07, this will have a performance impact but should keep the behavior of the rest of the stack unchanged.

Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Copy link
Collaborator

@mxgrey mxgrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this against 24.04 and rolling, and everything seems to be working.

I left some in-line comments for places where changes are needed, just to prevent segfaults and/or confusing crashes that could conceivably happen in some edge cases. If I find an opportunity, I'll make the changes myself and push them.

if (!joint)
_door_request_sub = _ros_node->create_subscription<DoorRequest>(
"door_requests", rclcpp::SystemDefaultsQoS(),
[&](DoorRequest::UniquePtr msg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A capture by reference [&] is never what we want for a lambda that's going into a subscription. This gives the highest likelihood of accidental memory corruption with no benefit.

We should change this to capture-by-copy, and only include the specific items that we need to capture, making sure that we don't capture any raw pointers that could conceivably be destructed while this subscription is still in use.

rmf_building_sim_gz_plugins/src/door.cpp Show resolved Hide resolved
rmf_building_sim_gz_plugins/src/door.cpp Outdated Show resolved Hide resolved
if (info.paused)
return;

double t =
const double t =
(std::chrono::duration_cast<std::chrono::nanoseconds>(info.simTime).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duration cast doesn't really make sense since it's casting a nanosecond-integer into a nano-second-integer (in other words, doing nothing) and then manually multiplying by the conversion factor to get seconds.

I recommend imitating the way duration cast is done in rmf_traffic or maybe just using rmf_traffic::time::to_seconds directly since rmf_simulation probably has a transitive dependency on it anyway.

const components::Name* name_comp) -> bool
{
double dt =
(std::chrono::duration_cast<std::chrono::nanoseconds>(info.dt).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duration cast also doesn't do anything helpful.

const std::string& floor_name) const
{
std::vector<Entity> doors;
for (const auto& door_pair : lift.floors.at(floor_name).doors)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This risks an exception if it gets called for a floor that the lift doesn't serve. I recommend using .find(~) here and checking the result against .end().


_lift_request_sub = _ros_node->create_subscription<LiftRequest>(
"lift_requests", reliable_qos,
[&](LiftRequest::UniquePtr msg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must not use capture-by-reference for subscription lambdas. This should be replaced with explicit capture-by-copy.

std::unordered_set<Entity> finished_cmds;

const double dt =
(std::chrono::duration_cast<std::chrono::nanoseconds>(info.dt).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fix this duration cast as well.

const std::shared_ptr<const sdf::Element>& sdf,
EntityComponentManager& ecm, EventManager& /*_eventMgr*/) override
{
if (sdf->HasElement("component"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to change here, just a remark: This approach of serializing components in the SDF is probably how SDF should be getting used in general. It would be worth discussing upstream whether the SDF maintainers would consider first-class support for this pattern.

@@ -227,10 +192,12 @@ void SlotcarPlugin::init_obstacle_exclusions(EntityComponentManager& ecm)
{
c = ::tolower(c);
});
if (n.find("door") != std::string::npos ||
n.find("lift") != std::string::npos ||
if (ecm.Component<components::Door>(entity) != nullptr ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much better approach to filtering these entities 👍 👍

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
mxgrey
mxgrey previously approved these changes May 23, 2024
Copy link
Collaborator

@mxgrey mxgrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the recommended changes. Now it would be good if we can get CI working before merging, but if that's blocked by upstream issues that won't be resolved soon then I suppose we can move forward with merging and sort out the CI when things are ready upstream.

Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
@luca-della-vedova
Copy link
Member Author

Jazzy CI failure is expected until all rmf repos have a Jazzy brach

@luca-della-vedova luca-della-vedova merged commit d9e55c6 into main May 29, 2024
3 of 4 checks passed
@luca-della-vedova luca-della-vedova deleted the luca/ecs_plugins_sandbox branch May 29, 2024 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make lifts manage their own doors Remove Boost dependency
3 participants