New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topic Subscription exponential backoff not working properly (too many retries) #2235
Comments
We've managed to reproduce the problem and we want to double check on the use case and on the impact of the issue
|
Thank you for addressing the issue. Let me explain our usecase. We have a backend deployed to an on-prem server that subscribes to a topic. We noticed that from time to time (say, every 2-3 days) the backend disconnects from the topic and it does not reconnect automatically (it tries 10 times and fails) - so our backend requires a manual restart to re-subscribe to a topic. We suspect that the initial disconnect is caused by temporary problem with the Internet connection.
It is a blocking problem. We built a workaround to monitor the topic subscription and re-create it if necessary but it is not a good solution and we would prefer it if the reconnect could be handled by the library automatically.
It was easy to test the reconnects in a repetitive and isolated environment (without dropping my network access) that way. It does not have anything to do with our production setup. In staging and prod environments we use testnet and mainnet.
When the max attempts is set to a large number and:
No, not at all. We only used it for reproducing this issue. We are typically using testnet and mainnet for development and production respectively. |
Thank you for the detailed explanation, understanding the use case is a key part from understanding the issue in deep details. At this point we've only managed to reproduce it, but we will need additional time for debugging and fixing it. |
Description
What I am trying to do is set up a backend that subscribes to a Topic and in case of an error retries connecting indefinitely.
For testing I set the max attempts to a large number and I pointed the client to a non-existent local node (so that the connection fails each time)
What I noticed is that the number of retries is rising unexpectedly fast. For debugging I added timestamp in the library
TopicMessageQuery.js
file in theMirrorChannel.makeServerStreamRequest
function error handler:As well as a console log in the timeout handler in the same function:
Please see the attached log.
After roughly 3 minutes there are about a 1000 retries with the max backoff time of 20s. I would expect it to be maximum of about 15-20 retries for these parameters.
What I noticed is that the error handler is executed twice for each error. Once with an
Error
class error and the second time with an object of the following form:I suppose this duplicated error handler execution leads to the unexpected exponential rise in the number of retries.
Could you please check this issue? It would also be great if you could add support for setting an indefinite number of max retries (perhaps a null or -1 value?)
Steps to reproduce
index.mjs
node index.mjs
Additional context
Hedera network
other
Version
v2.40.0
Operating system
Linux
The text was updated successfully, but these errors were encountered: