Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disordered Ack messages will be missed #103

Open
tagomoris opened this issue Oct 29, 2021 · 3 comments
Open

Disordered Ack messages will be missed #103

tagomoris opened this issue Oct 29, 2021 · 3 comments

Comments

@tagomoris
Copy link
Member

Currently, the write() function will wait for the ack message of the message sent. And reading ack messages from a connection will be processed in the order of written messages.
If the series of ack messages are disordered (it can happen), arrived messages will be missed and the original messages will be re-sent to the server. It is not a critical problem (because sending messages are retried), but it may cause a problem about highly heavier traffic with at-least-once configuration.

A possible solution could be (solution is not only this way, of course):

  1. write() will set the msg.ack value to a Set (with locking)
  2. write() lets a goroutine read response messages from the connection
  3. the goroutine will read a message from any of connections passed, and remove an AckResp.Ack from the Set (with locking)
  4. write() will wait until the ack value will be removed from the Set (or timeout)
  5. if it timed out, write() will retry to send the message
@tagomoris
Copy link
Member Author

This problem will last even after merging #82

@akerouanton
Copy link
Contributor

akerouanton commented Oct 31, 2021

@tagomoris Are you already working on this? 🙂

About the implementation details, I have the following solution in mind that would be lock-free:

  1. Start an ackRecv() goroutine in newWithDialer when Ack == true
  2. Use a dedicated channel (eg. named acksCh) to communicate between write() and ackRecv()
  3. Create an ack channel for each message in write() to communicate back from ackRecv() to write()
  4. Send a struct containing the ack ID, the ack channel and bool (with true value) from write() to ackRecv() (before actually writing to the connection, to avoid race conditions)
  5. Block the write() until either the ack channel or a timeout channel got triggered
    i. If the the timeout channel got triggered, resend the same struct as 4. but with the bool turned false.
  6. Concurrently read from the acksCh and the socket
    i. When an "ack request" is received from acksCh put it in a local map[string]struct{} if the bool is true or remove it if the bool is false.
    ii. When an ack response is received from the socket, check if there's a matching ID in the local map and send a struct{} to the appropriate channel and remove it from the local map.
  7. When Close() is called, acksCh got closed
    i. Set a local shouldClose bool to true in acksRecv()
    ii. When the last ackMessage is received or removed from the local map (cf. 6.) and shouldClose is true, exit from acksRecv()

My only concern with this solution is regarding #104.

@tagomoris
Copy link
Member Author

I've not worked on this yet - no plans for now.
@akerouanton Your idea was an alternative. I don't have enough knowledge about the creation cost of dedicated channels, so I can't evaluate which is better - but the idea of using channels seems a golang-ish solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants