Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listen for more socket failure events #163

Closed
wants to merge 2 commits into from

Conversation

varju
Copy link

@varju varju commented Feb 19, 2022

Fixes #162.

I'm not sure if there's a better way to solve the problem, but listening for the close event seems to deal with the problem for me. If the host disappears suddenly, error doesn't seem to be called. I added disconnect just in case, but perhaps this is unnecessary.

varju and others added 2 commits February 18, 2022 16:58
The `close` event is always called after an `error` event (https://nodejs.org/api/net.html#event-error_1), and `disconnect` is not an event emitted from a `net.Socket`. So we should only wait for a `close` event (which captures both `error` cases and non-`error` cases such as the server cleaning closing the connection with a FIN). This will avoid calling error callbacks twice (I believe this isn't technically a problem at the moment, but only calling `Server#error` once for a close should be more maintainable).
@alevy
Copy link
Member

alevy commented Mar 2, 2022

@varju Thanks for the PR! I've added a commit that simplifies the chages to just use the close event, which should capture both errors and non-error closes. (disconnect is not currently an event emitted from net.Socket). Can you confirm this still fixes the problem in your scenario from #162?

@varju
Copy link
Author

varju commented Mar 3, 2022

Hi @alevy. Running my test case from #162 with your patch in place, and restarting the memcached container triggers an application crash with:

node:events:504
      throw er; // Unhandled 'error' event
      ^

Error: connect ECONNREFUSED 192.168.1.9:11211
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1157:16)
Emitted 'error' event on Socket instance at:
    at emitErrorNT (node:internal/streams/destroy:164:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
  errno: -61,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '192.168.1.9',
  port: 11211
}

Node.js v17.5.0

As opposed to my previous attempt at fixing this, which reported several errors like the following, and then resumed when the server recovered:

MemJS: Server <pluto.varju.ca:11211> failed after (2) retries with error - connect ECONNREFUSED 192.168.1.9:11211
Loop failed Error: connect ECONNREFUSED 192.168.1.9:11211
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1157:16) {
  errno: -61,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '192.168.1.9',
  port: 11211
}

It's possible my example code lacks appropriate error handling. I confess that to some extent I'm button mashing when it comes to Node.

@saschat
Copy link
Member

saschat commented Mar 4, 2022

Additionally, I don't think the on close handler takes an error argument. We should probably separate the error and close handler. In the error handler just propagate the error and in the close handler deal with reconnection timeouts and so on. We just have to be careful since the error handling logic also touches reconnection variables (see line 63-67, are they even needed?).

@olemstrom
Copy link
Contributor

@saschat @alevy (as you two seem to be part of MemCachier org and can probably provide us with an update here):

Is memjs still maintained? Is there any possibility of getting this PR merged. I'm more than happy to clean it up if needed, in case @varju is unresponsive after the PR having been open for such a long time.

Services I'm currently working on are facing some issues with memjs, when the connection between memcached and our application is closed on Node 15+. When writing to the socket no error is emitted (<= node 14 does emit an error) by the net.Socket. As a result, memjs does not clear the connection information and does not re-establish the connection on the next write.

Listening socket.on('close') and clearing the connection there fixes the problems we are facing so getting this PR merged would be magnificent!

I'm more than happy to provide a reproducible test case if needed.

@saschat
Copy link
Member

saschat commented Mar 13, 2023

@olemstrom We semi-actively maintain it. This means we fix critical bugs and review PRs, but we currently don't have the capacity to actively work on it.

As for this PR, it is broken in its current form, it actually introduces a subtle bug. As per my comment above, close and error need to be dealt with separately. A cleanup or a new PR is absolutely welcome.

@olemstrom
Copy link
Contributor

@saschat Thanks for getting back to me! I have opened up a separate PR at #170

@saschat
Copy link
Member

saschat commented Mar 28, 2023

Obsolete since #170

@saschat saschat closed this Mar 28, 2023
@varju varju deleted the handle-disconnects branch May 14, 2024 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Application exits if memcached server restarts
4 participants