Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(kad): correctly handle NoKnownPeers error when bootstrap #5349

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stormshield-frb
Copy link
Contributor

Description

After testing master, we encountered a bug due to #4838 when doing automatic or periodic bootstrap if the node has no known peers.

Since it failed immediately, I though there was no need to call the bootstrap_status.on_started method. But no doing so never reset the periodic timer inside bootstrap_status resulting in getting stuck to try to bootstrap every time poll is called on kad::Behaviour.

Notes & open questions

N/A

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@guillaumemichel
Copy link
Contributor

This looks a bit hacky, wouldn't it be better to modify the bootstrap Status instead (e.g poll_next_bootstrap)?

@stormshield-frb
Copy link
Contributor Author

This looks a bit hacky, wouldn't it be better to modify the bootstrap Status instead (e.g poll_next_bootstrap)?

I'm not sure to understand what you mean. on_started and on_finished are intended for that purpose.

Even if I would update the Status directly, we would not be able to remove on_started completely since the end user could still manually trigger a bootstrap, and we would not be able to remove on_finish at all since there is currently no way to detect a bootstrap has finished outside exploring query_finished or query_timeout. And since a bootstrap can also fail immediately, we have to handle that there.

I agree that it would feel better to have a passive way to learn that a bootstrap did start or finish but I don't see how to implement that in a reasonably simple manner.

The reason we need to know if a bootstrap as started or finished is because we don't want to cascade bootstrap requests. When a bootstrap is triggered (no matter if it was automatic, periodic or manual), we reset the automatic and periodic timer to their initial value.

@stormshield-frb stormshield-frb force-pushed the fix/automatic-bootstrap-bug branch 2 times, most recently from 67e9aca to ec9898f Compare May 2, 2024 07:42
@@ -931,6 +931,7 @@ where
/// This parameter is used to call [`Behaviour::bootstrap`] periodically and automatically
/// to ensure a healthy routing table.
pub fn bootstrap(&mut self) -> Result<QueryId, NoKnownPeers> {
self.bootstrap_status.on_started();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that if peers.is_empty() we may not want to call self.bootstrap_status.on_started() at all, because we aren't performing a bootstrap.

If a periodic timer needs to be reset within the Status then the timer reset should probably be implemented there rather than calling on_started(), because no actual bootstrap is started. Or maybe I didn't understand what the original issue is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long time to reply.

I do agree a little bit with what you say. However, we encountered another issue with this on another matter.

Before periodic / automatic bootstrap, the user could always react to a NoKnownPeers error because it was he that triggered the bootstrap. However, since now the bootstrap is triggered automatically, the user is never notified if a NoKnownPeers error occurs.

That is why in our fork we have now removed the special case where we check if our routing table is empty before triggering a bootstrap : we always trigger it. Doing so allows us to receive an OutboundQueryProgressed with empty stats, allowing us to learn that a bootstrap has failed. We can then react to it and do some stuffs.

If you agree with this change (removing the check for empty routing table), then I think we can close this conversation because we can call self.bootstrap_status.on_started() because we actually always trigger one.

What do you think ? Do you agree that there is no need to check for empty routing table before triggering a bootstrap ? (no other kad call does this by the way).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants