Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disconnects seem to be quite ungraceful #342

Open
karalabe opened this issue Oct 29, 2021 · 2 comments
Open

Disconnects seem to be quite ungraceful #342

karalabe opened this issue Oct 29, 2021 · 2 comments
Labels

Comments

@karalabe
Copy link
Contributor

Been playing around with NSQ a lot lately and I keep hitting walls when trying to write test suites for assembling various network topologies. Most of the issues seems to stem from NSQD not handling properly consumer disconnects (I'm using go-nsq). I don't even know where to describe the strange things:

  • When stopping a consumer, sometimes the CLS message is sent to NSQD, sometimes it is not.
  • Even if the CLS does get to NSQD, sometimes it seems to not respond with CLOSE_WAIT, rather nukes the stream.
  • The logs are full of error messages on both consumer and broker side during shutdowns that one side or another tries to read/write but the stream is already dead (no graceful disconnect).
  • Disconnecting the last consumer doesn't seem to decrement the client count of a topic/channel.
  • Disconnecting a consumer doesn't seem to abort/reschedule the in-flight messages for that consumer.

Seems to me that the entire shutdown pathway is very very wrong, just that various timeouts hack around the root cause. E.g. the client heartbeats (or lack thereof after a disconnect) is the one that will trigger the cleanup of leftover client counts; the in-flight timeout is the one that reschedules messages nor processed by a disconnected client.

I'm unsure if I'm doing something weird here, but it seems that NSQ is very very prone to weird behavior when I have very short lived connections.

@ploxiln
Copy link
Member

ploxiln commented Nov 2, 2021

Honestly, we haven't historically worried much about clean client disconnects, and our tests don't pay attention to that in particular (just that messages go where they should go). We have a few existing issues about noisy logs related to as-clean-as-currently-possible disconnects ...

nsqio/nsq#521
nsqio/nsq#582
#103

it seems that NSQ is very very prone to weird behavior when I have very short lived connections

That is plausible, it was not designed for short lived tcp-protocol connections. But if you can offer some good fixes/cleanups for these cases, that would be great :)

@mreiferson
Copy link
Member

I'm inclined to move this issue over to go-nsq as I think it is likely the major contributing factor here.

@mreiferson mreiferson transferred this issue from nsqio/nsq Nov 28, 2021
@mreiferson mreiferson added the bug label Nov 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants