Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False-alarm timeout exception when waiting for key exchange to complete #909

Open
rasantel opened this issue Nov 9, 2023 · 1 comment
Open

Comments

@rasantel
Copy link
Contributor

rasantel commented Nov 9, 2023

Occasionally, my sshj client hits a timeout exception almost as soon as the main thread starts waiting for the key exchange to complete, even though the timeout has not been reached.

Based on the logs and code, this can happen during the following sequence of events between the main thread and the reader thread:

[2023-11-06T21:59:52,561][DEBUG][net.schmizz.sshj.transport.KeyExchanger][sshj-Reader-localhost/127.0.0.1:22-1699307992560][KeyExchanger.handle] Received SSH_MSG_KEXINIT
[2023-11-06T21:59:52,561][DEBUG][net.schmizz.sshj.transport.KeyExchanger][sshj-Reader-localhost/127.0.0.1:22-1699307992560][KeyExchanger.startKex] Initiating key exchange
[2023-11-06T21:59:52,562][DEBUG][net.schmizz.concurrent.Promise][Test worker][Promise.tryRetrieve] Awaiting <<kex done>>
[2023-11-06T21:59:52,562][DEBUG][net.schmizz.concurrent.Promise][sshj-Reader-localhost/127.0.0.1:22-1699307992560][Promise.deliver] Setting <<kex done>> to `null`

This is a race condition. What happens under the covers is:

  1. The key exchange is initiated by the reader thread when it receives SSH_MSG_KEXINIT from the server before the main thread has a chance to initiate that exchange.
  2. Before the reader thread calls done.clear() in KeyExchanger.startKex, the main thread also calls KeyExchanger.startKex, skips the key exchange initiation (because it's already ongoing) and calls done.await with a timeout to wait for the exchange to complete.
  3. done.await calls Promise.retrieve which calls Promise.tryRetrieve, which waits for the associated condition via cond.await.
  4. The reader thread calls done.clear(), which delivers a value of null to the done promise.
  5. This causes cond.await to wake up the main thread, and Promise.tryRetrieve returns this null value.
  6. Promise.retrieve throws a timeout exception because the retrieved value is null, even though there wasn't a timeout.
@rasantel
Copy link
Contributor Author

rasantel commented Nov 10, 2023

Please disregard the mention above from PR #911. I referenced the wrong issue there (I meant issue #910, not #909).

By the way, I'm looking into the code to see if I can propose a fix and raise a PR for #909.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant