Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing custom Tracker.list timeout #124

Open
sb8244 opened this issue Apr 29, 2019 · 2 comments · May be fixed by #127
Open

Providing custom Tracker.list timeout #124

sb8244 opened this issue Apr 29, 2019 · 2 comments · May be fixed by #127

Comments

@sb8244
Copy link

sb8244 commented Apr 29, 2019

Tracker.Shard uses a GenServer.call which doesn't accept any options from the caller. This call can be expensive in a large shard or one which is under heavy load.

Is it worthwhile to expose a timeout option the entire way through the function calls? My thought is to set a low timeout value when writing a listeners? function and then just assuming true if the function times out.

I'm happy to add this in but wanted to run it by the maintainers first. I could see a desire to expose this through all of the Tracker public APIs.

@michalmuskala
Copy link
Contributor

The work inside the call itself is minuscule - the main part of obtaining the list is performed by the caller directly. So the only reason for timeouts there would be that the shard server is overloaded and has a very long message queue - in that case crashing callers is a very primitive back-pressure mechanism.

@sb8244
Copy link
Author

sb8244 commented Apr 29, 2019

In my use case I am willing to accept the caller timing out and then just saying "there's listeners" as a safe default. The 5 second timeout is longer than I'd want however as the timeout length directly affects time to deliver messages when the system is backed up. The timeout of the tracker for track calls will cause the client to reconnect.

Still looking at why my tracker process is backing up. I have ensured that there are no single large topics and have 8 shards which have pretty evenly distributed reduction count. However, having the ability to customize the timeout would allow me to reduce the maximum duration of message delivery in the worst case scenario of a backed up tracker shard.

indrekj added a commit to indrekj/phoenix_pubsub that referenced this issue Jul 15, 2019
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing replicas information in an ets table allows us to avoid going
through genserver and allows us to process list/get_by_key immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
@indrekj indrekj linked a pull request Jul 15, 2019 that will close this issue
indrekj added a commit to indrekj/phoenix_pubsub that referenced this issue Jul 15, 2019
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing down replicas information in an ets table allows us to avoid
going through genserver and allows us to process list/get_by_key
immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
indrekj added a commit to indrekj/phoenix_pubsub that referenced this issue Jul 18, 2019
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing down replicas information in an ets table allows us to avoid
going through genserver and allows us to process list/get_by_key
immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
indrekj added a commit to indrekj/phoenix_pubsub that referenced this issue Aug 1, 2019
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing down replicas information in an ets table allows us to avoid
going through genserver and allows us to process list/get_by_key
immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
indrekj added a commit to indrekj/phoenix_pubsub that referenced this issue Aug 1, 2019
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing down replicas information in an ets table allows us to avoid
going through genserver and allows us to process list/get_by_key
immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
indrekj added a commit to indrekj/phoenix_pubsub that referenced this issue Aug 1, 2019
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing down replicas information in an ets table allows us to avoid
going through genserver and allows us to process list/get_by_key
immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
indrekj added a commit to indrekj/phoenix_pubsub that referenced this issue Dec 4, 2019
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing down replicas information in an ets table allows us to avoid
going through genserver and allows us to process list/get_by_key
immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
indrekj added a commit to salemove/phoenix_pubsub that referenced this issue Dec 4, 2019
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing down replicas information in an ets table allows us to avoid
going through genserver and allows us to process list/get_by_key
immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
indrekj added a commit to salemove/phoenix_pubsub that referenced this issue Dec 4, 2019
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing down replicas information in an ets table allows us to avoid
going through genserver and allows us to process list/get_by_key
immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
indrekj added a commit to salemove/phoenix_pubsub that referenced this issue Jan 8, 2020
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing down replicas information in an ets table allows us to avoid
going through genserver and allows us to process list/get_by_key
immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
indrekj added a commit to salemove/phoenix_pubsub that referenced this issue Jan 30, 2020
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing down replicas information in an ets table allows us to avoid
going through genserver and allows us to process list/get_by_key
immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
indrekj added a commit to salemove/phoenix_pubsub that referenced this issue Apr 16, 2020
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.

Storing down replicas information in an ets table allows us to avoid
going through genserver and allows us to process list/get_by_key
immediately.

I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.

This should also resolve phoenixframework#124
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants