Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(electric): Increase the ping heartbeat to 20 seconds from 5 seconds #1228

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

davidmartos96
Copy link
Contributor

@davidmartos96 davidmartos96 commented May 6, 2024

After some discussion with @kevin-dp we are under the assumption that 5 seconds of ping timeout is quite low. Other popular websocket libraries like https://socket.io use a ping interval of 25 seconds and a ping timeout of 20 seconds. The timeout default changed from 5 seconds to 20 seconds over the years.

On top of this change, we noticed that the scheduler can sometimes undershoot. That is, if we schedule after 5000 ms, we could then obtain a last_msg_diff of 4999.7 ms, which wouldn't match the > operator. Because of that, a small margin error of 500 microseconds is added.
That would prevent obtaining uneven ping durations -> 20s, 20s, 40s, 20s, 40s,... (That's because if it doesn't match, we will schedule for another 20 seconds)

@alco
Copy link
Member

alco commented May 14, 2024

On top of this change, we noticed that the scheduler can sometimes undershoot. That is, if we schedule after 5000 ms, we could then obtain a last_msg_diff of 4999.7 ms, which wouldn't match the > operator.

This is very suspicious. Are you sure it's not something else? Like, for example, an incoming message that's processed right after a new ping has been scheduled, causing the last_time_msg field to be updated. Timers can fire with a slight delay but they should never fire early.

By the way, taking a step back and squinting our eyes, we don't need to know the exact time difference. We know that the {:timeout, :ping_timer} message is processed once every 20 seconds. In order to decide whether to disconnect the client, we just need to know if any messages have arrived within that interval, either regular messages or a pong.

So instead of introducing an arbitrary 500-microsend timing margin, I would instead set the last_msg_time field to an atom like :ping_received and match on that in the handle_info() handler.

@icehaunter any objections?

@davidmartos96
Copy link
Contributor Author

@alco If I log the diff this is what I get. Most ones will be a bit over the target duration, but it can get a bit short sometimes.
But as you say, it might be something else.

image

image

@alco
Copy link
Member

alco commented May 19, 2024

@davidmartos96 @icehaunter Here's an alternative approach that, instead of adding a margin of error to timer firings, keeps a separate timestamp for sent pings. Every time a ping interval elapses, we check if any messages have arrived from the client after the previous ping had been sent. This way it doesn't matter whether the timer fires slightly earlier or slightly later.

@davidmartos96
Copy link
Contributor Author

@alco That looks easier, thanks! Should I pull those changes in, or can you commit on this branch?

A bit on topic with the pings, we found an odd behavior with the inactive websocket disconnection. I reported it the other day on Discord, but I don't know if you guys have it tracked on Linear.
https://discord.com/channels/933657521581858818/1240698235262337103/1240698237992833065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants