You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, 1 out 4 of changing certs. after nats-server restarts, client can fall into forever loop.
Given the capability you are leveraging, describe your expectation?
Under no circumstance would _read_loop fall into a non-yield forever loop. It blocked everything.
Given the expectation, what is the defect you are observing?
Hi,
Our ssl nats client could fall into a forever loop when certs changed (nat-server restarted). Eventually we located the forever loop was inside _read_loop() inside aio/client/py:
while True:
try:
should_bail = self.is_closed or self.is_reconnecting
if should_bail or self._transport is None:
break
if self.is_connected and self._transport.at_eof():
err = errors.UnexpectedEOF()
await self._error_cb(err)
await self._process_op_err(err)
break
b = await self._transport.read(DEFAULT_BUFFER_SIZE) <-----
await self._ps.parse(b)
b is always empty, eof is true. When this issue happened, all other asyncio tasks were blocked because this loop never yield.
We hit this issue when certs changes . After certs changed, nats-server restarts first. The nats-client using old certs should detect the failure and eventually reconnect using new certs. But nats-client falls into a forever loop and blocks other tasks from running. This issue can be seen around 1 out 4 tries.
Have anyone ever hit similar issue? It seems a bug to me. Under no circumstance would _read_loop fall into a non-yield forever loop.
Thanks
Steven
The text was updated successfully, but these errors were encountered:
Forget about certs change. I can make it happen easily. It can happen anytime if the criteria is met.
Actually I did see once when my app just restarted (no nats server restart).
When self._status == CONNECTING, self._transport.at_eof(), and b = await self._transport.read(..) is empty, _read_loop will fall into a never-yield-forever-loop. Once this happened, it would block all other tasks from running.
self._status can be CONNECTING at connect or attempt_reconnect(). It depends on when transport.at_eof() happens. If transport eof came in when nats client was at CONNECTING status and if buffer was empty, this issue would happen.
Below is the change I used. It just forces to set_eof in asyncio/sslproto.py. Remember to run "echo>/var/log/robot/set_eof" before restart your app.
What version were you using?
nats-py-2.3.1.tar.gz
What environment was the server running in?
alpine linux - 3.18.2
Is this defect reproducible?
Yes, 1 out 4 of changing certs. after nats-server restarts, client can fall into forever loop.
Given the capability you are leveraging, describe your expectation?
Under no circumstance would _read_loop fall into a non-yield forever loop. It blocked everything.
Given the expectation, what is the defect you are observing?
Hi,
Our ssl nats client could fall into a forever loop when certs changed (nat-server restarted). Eventually we located the forever loop was inside _read_loop() inside aio/client/py:
b is always empty, eof is true. When this issue happened, all other asyncio tasks were blocked because this loop never yield.
We hit this issue when certs changes . After certs changed, nats-server restarts first. The nats-client using old certs should detect the failure and eventually reconnect using new certs. But nats-client falls into a forever loop and blocks other tasks from running. This issue can be seen around 1 out 4 tries.
Have anyone ever hit similar issue? It seems a bug to me. Under no circumstance would _read_loop fall into a non-yield forever loop.
Thanks
Steven
The text was updated successfully, but these errors were encountered: