[Bug] QoS incompatibility issue New publisher discovered on topic '/bot/firmware/drive_state/front', offering incompatible QoS. No messages will be sent to it. Last incompatible policy: RELIABILITY_QOS_POLICY #154

nazmicancalik · 2023-10-30T09:11:19Z

Describe the bug

Zenoh peers have qos incompatibility issue when there are 2 different Ros QoS subscribers. Consider the following situation:

In the ROS2 Code: There is a Reliable Publisher and 1 Reliable and 1 Best Effort Subscriber to the same topic. To our understanding, sometimes (when zenoh discovers the Best Effort Subscriber DDS first rather than the Reliable one) it gives the following error:

New publisher discovered on topic '/bot/firmware/drive_state/front', offering incompatible QoS. No messages will be sent to it. Last incompatible policy: RELIABILITY_QOS_POLICY

In the zenoh source we have found a TODO that it says check the qos that is existing for a qos and do not create an incompatible one if you receive a new discovery.
Do you have any ideas if this is related or why this problem might occur?

To reproduce

Start a subscriber with BEST_EFFORT QoS to topic a.
Start a subscriber with RELIABLE QoS to topic a.
Start a publisher with RELIABLE QoS to topic a.
Error occurs.

System info

Platform: Ubuntu 22.04.2.LTS 64 bit
Processor: AMD® Ryzen 9 5900hx with radeon graphics × 16
Zenoh-bridge-dds version: zenoh bridge for DDS v0.7.2-rc built with rustc 1.70.0 (90c541806 2023-05-31)

nazmicancalik · 2023-10-30T12:43:29Z

To recreate the issue please run this following script:

 #!/usr/bin/env python3
   
  import rclpy.node
  import rclpy.qos
   
  from std_msgs.msg import String
   
  class ReliablerSubscriber(rclpy.node.Node):
      def __init__(self):
          super().__init__('reliable_subscriber')
   
          self.sub1 = self.create_subscription(
              String, 'topic', self.callback, qos_profile=rclpy.qos.qos_profile_sensor_data)
   
   
          self.sub2 = self.create_subscription(
              String, 'topic', self.callback, qos_profile=rclpy.qos.qos_profile_parameters)
          
   
      def callback(self, msg: String):
          self.get_logger().info('I heard: "%s"' % msg.data)
   
  def main(argv=None):
      rclpy.init(args=argv)
      node = ReliablerSubscriber()
      rclpy.spin(node)
      rclpy.shutdown()
   
  if __name__ == '__main__':
      main()

JEnoch · 2023-11-02T15:44:01Z

Hi @nazmicancalik,

You correctly analyzed the problem:
for efficiency and simplicity concerns, the bridge creates only 1 DDS Reader or Writer per Topic. The QoS used for those creations depend on the first discovered DDS entity on the Topic (local discovery by default, remote discovery by another bridge in Forward Discovery mode).
Thus, it might happen that the bridge creates a Reader or a Writer with QoS that are not suitable for a Reader/Writer discovered afterward.

A solution could be the one proposed by @Fellfalla in #155 (thanks for this!): delete the Reader or Writer, and re-create it with "upgraded" QoS.

But I see several drawbacks to this approach:

The Reliability QoS is no longer guaranteed, since some publications could be missed in the time interval between the deletion and the re-creation of the DDS entity
In case of TRANSIENT_LOCAL topic, this could lead a Subscriber to receive duplicate messages of historical data, since at Reader re-creation, the historical data will be received and routed again.
It could lead a working configuration to stop working just declaring a Subscriber. Imagine the following scenario:
- Start a publisher with BEST_EFFORT QoS to topic a.
- Start a subscriber with BEST_EFFORT QoS to topic a.
  => compatible QoS, communication occurs via the bridge, all is fine
- Start a subscriber with RELIABLE QoS to topic a.
  => the bridge on publisher side upgrades the QoS of it's DDS Reader to RELIABLE, which is incompatible with the BEST_EFFORT publisher. Thus communication stops.

The other solution I'm considering is to add the possibility to configure the QoS to be used by the bridge for certain topics in its JSON5 config file. Those configured QoS would override the discovered ones.
Thus, when you know that for some topics a mix of QoS settings are used by different Publisher/Subscribers in your system, you can force the bridge to use the configured QoS.

Do you think this is a suitable solution for your use case ?

Fellfalla · 2023-11-12T11:29:28Z

Hi @JEnoch thank you for your feedback and thoughts 🙂!

To answer your general question: overwriting QoS from the config is definitely a good and clean solution to us. However, it requires more thorough attention to all the existing and new sub QoSes and thus is a bit less convenient. Thus, maybe a few thoughts on your valid points:

Is the reliability at the time of creation of a reliable subscriber actually relevant? It seems as its only possible to drop msgs if an unreliable qos exists and then a new reliable qos pops up. Thus only msgs for the best effort subscriber might be dropped, or am i mistaken?
Good Point. This actually reminds me of sth. thats already seems to be happening in our stack. Not sure if its actually related. I can try recreate an example setup.
At least on the ROS2 part, this is already an invalid system configuration which broken communication for the reliable subscriber. If we force Reliable via config, the communication wouldnt work either.

Looking forward to hear back from you!

JEnoch · 2023-12-01T17:29:50Z

Is the reliability at the time of creation of a reliable subscriber actually relevant? It seems as its only possible to drop msgs if an unreliable qos exists and then a new reliable qos pops up. Thus only msgs for the best effort subscriber might be dropped, or am i mistaken?

That's correct.

At least on the ROS2 part, this is already an invalid system configuration which broken communication for the reliable subscriber. If we force Reliable via config, the communication wouldnt work either.

Right, but in a pure ROS2 system the best-effort Publisher to best-effort Subscriber will still occur.
While with the bridges, if we upgrade the bridge's Subscriber to reliable ALL communication will stop on this topic.

JEnoch · 2023-12-01T17:53:10Z

After reflexion, I think wrt. RELIABLE QoS it would make sense for the bridge to always create a RELIABLE Writer, whatever the discovered QoS. Thus it's always matching any future Readers, whatever their reliability QoS.
And according to CycloneDDS team, this won't make significant difference wrt. resources consumption in the bridge.

I created #165 and eclipse-zenoh/zenoh-plugin-ros2dds#23 to track this, and that will fix this issue.

For other QoS I still think "on-the-fly" adaptation via deletion/re-creation of the Writer might cause troubles, starting with loss of messages for instance in the case a RELIABLE+VOLATILE Writer has to be upgraded to RELIABLE+TRANSIENT_LOCAL.

For this, the possibility for the user to configure the QoS used by the bridge per-topic (or per-group of topics via regex) is a better solution, as more flexible and allowing fine tuning for other purposes than QoS matching issues.
I created #166 and eclipse-zenoh/zenoh-plugin-ros2dds#24 to track this feature.

JEnoch · 2023-12-11T16:53:37Z

@Fellfalla , now that #165 and eclipse-zenoh/zenoh-plugin-ros2dds#23 have been fixed, could you please check if that solves your issue (with either zenoh-bridge-dds or zenoh-bridge-ros2dds ?

nazmicancalik added the bug Something isn't working label Oct 30, 2023

Fellfalla mentioned this issue Oct 30, 2023

Fix/adapt highest qos for dds writer #155

Closed

This was referenced Dec 1, 2023

The bridge should always create RELIABLE Writers eclipse-zenoh/zenoh-plugin-ros2dds#23

Closed

The bridge should always create RELIABLE Writers #165

Closed

Fellfalla mentioned this issue Feb 5, 2024

QoS configuration per-Topics/Services/Actions for Readers/Writers created by the bridge eclipse-zenoh/zenoh-plugin-ros2dds#24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] QoS incompatibility issue New publisher discovered on topic '/bot/firmware/drive_state/front', offering incompatible QoS. No messages will be sent to it. Last incompatible policy: RELIABILITY_QOS_POLICY #154

[Bug] QoS incompatibility issue New publisher discovered on topic '/bot/firmware/drive_state/front', offering incompatible QoS. No messages will be sent to it. Last incompatible policy: RELIABILITY_QOS_POLICY #154

nazmicancalik commented Oct 30, 2023

nazmicancalik commented Oct 30, 2023 •

edited

JEnoch commented Nov 2, 2023

Fellfalla commented Nov 12, 2023

JEnoch commented Dec 1, 2023

JEnoch commented Dec 1, 2023

JEnoch commented Dec 11, 2023

[Bug] QoS incompatibility issue New publisher discovered on topic '/bot/firmware/drive_state/front', offering incompatible QoS. No messages will be sent to it. Last incompatible policy: RELIABILITY_QOS_POLICY #154

[Bug] QoS incompatibility issue New publisher discovered on topic '/bot/firmware/drive_state/front', offering incompatible QoS. No messages will be sent to it. Last incompatible policy: RELIABILITY_QOS_POLICY #154

Comments

nazmicancalik commented Oct 30, 2023

Describe the bug

To reproduce

System info

nazmicancalik commented Oct 30, 2023 • edited

JEnoch commented Nov 2, 2023

Fellfalla commented Nov 12, 2023

JEnoch commented Dec 1, 2023

JEnoch commented Dec 1, 2023

JEnoch commented Dec 11, 2023

nazmicancalik commented Oct 30, 2023 •

edited