Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency guarantees of connection events #1013

Open
aromaa opened this issue May 23, 2023 · 15 comments
Open

Concurrency guarantees of connection events #1013

aromaa opened this issue May 23, 2023 · 15 comments

Comments

@aromaa
Copy link

aromaa commented May 23, 2023

I'm raising this issue to get clarification on what the concurrency guarantees are supposed to be and showing the possible problems you could face with the current behavior due to the current implementations possible racy behavior. I'm focusing here for the cases where the kick-existing-players has been turned off as that is the default behavior. However turning that on has multiple additional quirks that could be discussed.

Currently we are unregistering the connection on teardown even before the DisconnectEvent has been fired as can be seen here. Technically it is now possible for the player to login and trigger events related to post Mojang authorization like PermissionsSetupEvent or LoginEvent. This could lead to situations where we initialize some data on login event but the disconnect event ends up incorrectly clearing them. The DisconnectEvent event seems to have note about that "Velocity typically fires this event asynchronously and does not wait for a response." and the current behavior seems somewhat intended but in my opinion this doesn't make sense for post Mojang authorization.

The solution is quite simple, move the connection unregistration after the event has fired. But this also has some implications like what if some bad behaving plugin doesn't complete the event? This would block all further logins from the player until the proxy would be restarted but this argument seems moot to me as correctness should be preferred. Another argument could be made that this is very unlikely due to the high delay between user actions. Dealing with this in a plugin would require comparing the actual Player references which is very inconvenient.

Fixing just the disconnection isn't enough as the connection registration is done after LoginEvent as can be seen here. This could result in somewhat similar race where the player logins, cancels it and then reconnects immediately. This is also somewhat unlikely due to the default login rate limiting of 3s.

The solution here is to move it to when the AuthSessionHandler gets activated. This also has some implications like the getPlayer API would return the Player much earlier and plugins might assume all the state has been already initialized at this point which is no longer true. This could be fixed by adding additional conditions where we would not return the player even if its present in the collection. Additionally previously cancelling the LoginEvent when kick-existing-players is set to true would not kick the currently active session but now it would.

Well that was a lot! I might have missed something but hopefully that sums up the current state for at least when the kick-existing-players is turned off..

@astei
Copy link
Contributor

astei commented May 24, 2023

It's not really that simple though. Most plugins do not even pretend to handle these cases.

As for DisconnectEvent, by the time it's called, the existing player's connection has been terminated. They crossed the Rubicon, you cannot pretend the player remains online.

If you want to be absolutely certain you are dealing with a specific Player instance and not some later connection for the same player, you have little choice but to compare the actual Player reference (and you can easily use a weak reference for this). Internally, this is exactly what Velocity does as well in VelocityServer.registerConnection() and VelocityServer.unregisterConnection(). We do not want to wait for connections to be torn down before allowing a new connection through.

This is a consequence of prioritizing resource clean up, Velocity reflecting the state of the world as it is at that moment, an acknowledgment that user plugins are often poorly written, a consequence of operating in a multi-threaded environment.

We don't offer guarantees, because it is difficult for us to provide one without compromising somewhere. If you want a guarantee, you have to do it within your plugin.

@aromaa
Copy link
Author

aromaa commented May 24, 2023

I understand that its not that simple and to be honest I'm expecting little to be done to this due to it requiring several changes how things work internally. I'm fine with this but unfortunately this isn't even as simple to handle in the plugin which ended up me raising this issue. It can done, yes, and I'm going to over few solutions and quirks they have. For me I would prefer the correctness as its unlikely to hinder the typical use case while protecting against possible bugs beyond the ones I can control while massively simplifying the event handling in the plugin side.

This is going to get long, but lets look at examples. Its way easier to demonstrate the trickery you have to do to actually get this right.

Lets start with the case that plugins typical tend to use.

private ConcurrentMap<UUID, User> users;

public void onLoginEvent(LoginEvent event) {
    this.users.put(event.getPlayer().getUniqueId(), new User());
}

public void onDisconnectEvent(DisconnectEvent event) {
    this.users.remove(event.getPlayer().getUniqueId());
}

The code suffers from a race where the disconnection can clear the user from the map after the login even has completed. To fix this, we can use the Player as the key and use Cache instead of ConcurrentMap.

private Cache<Player, User> users; //Weak keys

public void onLoginEvent(LoginEvent event) {
    final User user = new User();
    this.users.put(event.getPlayer(), user);
    if (!event.getPlayer().isActive()) {
        this.users.invalidate(event.getPlayer());
    }
}

public void onDisconnectEvent(DisconnectEvent event) {
    this.users.invalidate(event.getPlayer());
}

This works fine but we are limited to accessing the map by Player keys, let's introduce another map to allow accessing by UUID.

private Cache<Player, User> users; //Weak keys, eviction listener to clear usersByUuid on eviction just in case
private ConcurrentMap<UUID, User> usersByUuid;

public void onLoginEvent(LoginEvent event) {
    final User user = new User();
    this.users.put(event.getPlayer(), user);
    this.usersByUuid.put(event.getPlayer().getUniqueId(), user);
    if (!event.getPlayer().isActive()) {
        this.users.invalidate(event.getPlayer());
        this.usersByUuid.remove(event.getPlayer().getUniqueId(), user);
    }
}

public void onDisconnectEvent(DisconnectEvent event) {
    final User user = this.users.asMap().remove(event.getPlayer());
    this.usersByUuid.remove(user.getUniqueId(), user);
}

Now this looks good, or does it? The answer is a no! Consider the following scenario: LoginEvent A fired -> LoginEvent B fired -> LoginEvent B completed -> LoginEvent A completes. Now we have the reference of the old User in our usersByUuid, ouch!! To fix this we are forced to deny logins until the DisconnectEvent has been fired.

private Cache<Player, User> users; //Weak keys, eviction listener to clear usersByUuid on eviction in case
private ConcurrentMap<UUID, User> usersByUuid;

public void onLoginEvent(LoginEvent event) {
    final User user = new User();
    if (this.usersByUuid.putIfAbsent(event.getPlayer().getUniqueId(), user) != null) {
        event.setResult(ResultedEvent.ComponentResult.denied(Component.text("You are still logged in.")));
        return;
    }
    this.users.put(event.getPlayer(), user);
    if (!event.getPlayer().isActive()) {
        this.users.invalidate(event.getPlayer());
        this.usersByUuid.remove(event.getPlayer().getUniqueId(), user);
    }
}

public void onDisconnectEvent(DisconnectEvent event) {
    final User user = this.users.asMap().remove(event.getPlayer());
    this.usersByUuid.remove(user.getUniqueId(), user);
}

Unfortunately after going all that trouble there is still one problem to tackle. What if you are providing an API to retrieve the user by using something like getPlayer(UUID). You could additionally provide Velocity only API like getPlayer(Player) but this is very fragile. The 3th party plugin that relies on your library is still vulnerable to these racy conditions. Something more appealing would be to fire events like UserLoginEvent and UserDisconnectEvent to provide a layer on this middle ground where they can get correct reference to the User but its still up to the 3th party plugin to do the correct thing. They can't and shouldn't access the getPlayer(UUID) when trying to access the User inside a LoginEvent or DisconnectEvent.

Let's go with the the event based system. If we just fire them like Velocity does its connection events the 3th party plugin also needs to handle these racy conditions for the User object itself to make sure its state is correct. This adds additional friction and we might consider that maybe we should instead add concurrency primitives to our code to make sure these events are fired in order. Now we have a lot of complex code but we should be mostly covered.

What if we just bite the bullet and introduce a getPlayer(Player) to simplify our code but force the 3th party plugins to deal with it. Well.. if you are providing a cross-platform API and then you have to introduce one API for specific platforms that then gets very confusing and complicated to ship. This is not an option in my mind.

This does get a fairy bit more complicated when you start to introduce the I/O calls so you most likely want to change your usersByUuid to something like ConcurrentMap<UUID, CompletableFuture<User>> where you can manually complete the CompletableFuture<User> to make sure we don't spin up additional I/O calls while the existing one is in flight. Also additional logic and concurrency primitives might be needed to deal with the situation where we are tearing down the connection but our login is still performing I/O operations.

Hopefully this shines some light on what I'm mostly concerned about and my thought process on how to solve that on the plugin side itself. It is not perfect and getting some help and heavy lifting from the proxy side would be nice to have but its understandable if the current design is going to stay, I can live with it.

SIDE NOTE: This problem theoretically also exists for other async events but I'm not concerned about those. You can quite easily work this around by using semaphore and comparing the player references if that is absolutely necessary.

@astei
Copy link
Contributor

astei commented May 24, 2023

So, for your last code sample, you really should've solved it like this:

private Cache<Player, User> users; //Weak keys, eviction listener to clear usersByUuid on eviction in case
private ConcurrentMap<UUID, User> usersByUuid;

public void onLoginEvent(LoginEvent event) {
    final User user = new User();
    if (this.usersByUuid.putIfAbsent(event.getPlayer().getUniqueId(), user) != null) {
        event.setResult(ResultedEvent.ComponentResult.denied(Component.text("You are still logged in.")));
        return;
    }
    this.users.put(event.getPlayer(), user);
    if (!event.getPlayer().isActive()) {
        this.users.invalidate(event.getPlayer());
        this.usersByUuid.remove(event.getPlayer().getUniqueId(), user);
    }
}

public void onDisconnectEvent(DisconnectEvent event) {
    final User user = this.users.asMap().remove(event.getPlayer());
    // only remove from usersByUuid if the user instance equals the one for the specific player
    this.usersByUuid.remove(user.getUniqueId(), user);
}

Your current code has a bug in the case where kick-existing-players is true.

But more to the point, I don't have a good answer other than to isolate "online player state" from "offline player state" (which would be available irregardless of whether the player is actually online), and to not solely rely on a player being online to populate its current state, to minimize the blast radius here. That was the approach I wound up taking with the last server I worked with (Mineteria).

@astei
Copy link
Contributor

astei commented May 24, 2023

Of course, you could split the difference:

private Cache<Player, User> users; //Weak keys, eviction listener
private ProxyServer server;

public void onLoginEvent(LoginEvent event) {
    final User user = new User();
    this.users.put(event.getPlayer(), user);
}

public void onDisconnectEvent(DisconnectEvent event) {
    this.users.invalidate(event.getPlayer());
}

public Optional<User> getOnlinePlayer(final String username) {
    return server.getPlayer(username).map(this.users::getIfPresent);
}

public Optional<User> getOnlinePlayer(final UUID uuid) {
    return server.getPlayer(uuid).map(this.users::getIfPresent);
}

But you're still stuck with separating that "offline player state" either way.

@aromaa
Copy link
Author

aromaa commented May 24, 2023

Your current code has a bug

Ah, thanks. Seems like I forgot it from the disconnection events.

to not solely rely on a player being online to populate its current state

In my case the problem is more that there is no offline data to populate and more that its cleared and saved on disconnection. So when we accept new connection we should have empty state but without truncating the already existing session in case it has not been saved yet due to the delays in the event handling.

@astei
Copy link
Contributor

astei commented May 24, 2023

In my case the problem is more that there is no offline data to populate and more that its cleared and saved on disconnection. So when we accept new connection we should have empty state but without truncating the already existing session in case it has not been saved yet due to the delays in the event handling.

So use the Player instance from DisconnectEvent to fetch the user object, and then use that to save whatever state from User you need. Just keep in mind that there may be a connection that is relying on that data, so you should be very careful about caching data eagerly.

@aromaa
Copy link
Author

aromaa commented May 24, 2023

I know, hence the issue. There is no way to make this work with 3th party plugins unless you provide your own events and handle the synchronization yourself by blocking the logins and delaying disconnect events as I mentioned previously.

If there are no plans to alter the way these events are fired and the information I provided does not change that then we can go ahead and close this issue. I can deal with this myself in the plugin in the best possible way that I can to try to make this work.

I'm also happy to contribute any changes to these myself.

@ytnoos
Copy link

ytnoos commented Sep 29, 2023

@aromaa any news about this? I think it is quite embarrassing to have to do all that for a simple data loading/unloading of the player in a safe way (which still is not despite the attempts).

@aromaa
Copy link
Author

aromaa commented Sep 29, 2023

What I understood from the above conversation was that there was not much of interest to make changes to the current behavior.

This requires some discussion about how they would like these changes to be made as it does require more complexity to be added to the connection logic. There is also some open questions like would this be a opt-in config option or whatever a plugin can implicitly request this per connection. If this can be opted in implicitly do we want new events or how would plugin request this.

@ytnoos
Copy link

ytnoos commented Sep 30, 2023

I looked more closely at the code, and I'd have something to say about this:

Fixing just the disconnection isn't enough as the connection registration is done after LoginEvent as can be seen here. This could result in somewhat similar race where the player logins, cancels it and then reconnects immediately. This is also somewhat unlikely due to the default login rate limiting of 3s.

As you can see here the LoginEvent is not even going to be called (assuming kick-existing-players is set to false as default) since Velocity is checking if there is already any active connection right after the Player instance is made

If we assume what I'm saying is correct, then moving DisconnectEvent before server.unregisterConnection(this); would be enough to fix all the issues.
I know that astei may not like the idea of having a blocking disconnect event but in this case, we are not delaying any real connection closing since that is done few lines before, we are just delaying the removal of a username and a UUID from a map, which should be the actual right behavior.

I also wanted to point out that canRegisterConnection could just accept UUID and Username in order to instance ConnectedPlayer (and call a useless DisconnectionEvent) later to avoid referencing lots of useless objects, but this is just a minor performance improvement.

@aromaa
Copy link
Author

aromaa commented Sep 30, 2023

As you can see here the LoginEvent is not even going to be called (assuming kick-existing-players is set to false as default) since Velocity is checking if there is already any active connection right after the Player instance is made

The issue is that there is this small gab where the check can be passed while there is active connection. Think about a scenario where the player connects and the LoginEvent is still being processed, disconnects and then connects again. If the LoginEvent is being still processed due to timing issues then now you have two LoginEvents in flight for the same player.

@ytnoos
Copy link

ytnoos commented Sep 30, 2023

Oh, you're right. Then that has to be adjusted too. Maybe the idea of having a duplicate register connection check was to avoid unnecessary calls later. I think these two inconsistencies are a concept that needs to be fixed within Velocity itself rather than handled by a plugin; in my opinion, it is not a concurrency issue that the plugin has to handle. Being in a multi-threaded environment is no excuse for not having an eye for events call logic.

I am also in the same situation as you where I need to load and unload player data. Have you considered using the PostLoginEvent? In theory, it should have no problem when combined with the DisconnectEvent fix you proposed, the only flaw is that it is not a cancellable event.

@aromaa
Copy link
Author

aromaa commented Sep 30, 2023

Have you considered using the PostLoginEvent? In theory, it should have no problem when combined with the DisconnectEvent fix you proposed, the only flaw is that it is not a cancellable event.

That still doesn't fully solve the possible race with disconnect events clearing the data afterwards for a new connections.

I stuck with my solution that I described on #1013 (comment) (the last code block) which just prevents new logins in case there is a race as that should be a rare case anyways. One could add a lock that's released on disconnect so you can just wait for the previous connection to be released and then process if you would like to avoid kicking the user. I also have internal events to sync other plugins in between the login and disconnect events. Yes, its a mess and yes I would prefer not to do this but at least it solves all the problems.

@ytnoos
Copy link

ytnoos commented Sep 30, 2023

I don't see any issues in applying the Disconnect fix patch you mentioned, by moving the DisconnectEvent call before unregisterConnection, and then using this code:

private ConcurrentMap<UUID, User> usersByUuid;

public void onPostLoginEvent(PostLoginEvent event) {
    usersByUuid.put(event.getPlayer().getUniqueId(), new User())
}

public void onDisconnectEvent(DisconnectEvent event) {
    if(event.getLoginStatus() == DisconnectEvent.LoginStatus.SUCCESSFUL_LOGIN || 
       event.getLoginStatus() == DisconnectEvent.LoginStatus.PRE_SERVER_JOIN) 
        usersByUuid.remove(user.getUniqueId());
}

In this way:

  1. PostLoginEvent is impossible to be fired multiple times because it simply is after registerConnection.
  2. DisconnectEvent is going to clear data only if the real player has quit and not for any other failed Login attempt. I wanted to include PRE_SERVER_JOIN because a player could crash in the process of connecting to the first server or while being kicked and redirected to another one.
  3. (by implementing the Disconnect fix patch) Velocity will unregister the connection only after the DisconnectEvent is fired, so it won't be able to fire any PostLoginEvent until the DisconnectEvent finished.

Let me know if I'm wrong 😄

@aromaa
Copy link
Author

aromaa commented Sep 30, 2023

I don't see any issues in applying the Disconnect fix patch you mentioned, by moving the DisconnectEvent call before unregisterConnection

The side effect would be that players could receive a message that states that "They are already logged in" which was one of asteis concerns. That could be worked around by checking if the connection is being teared down and just wait for that to finish but this might soft-lock if some plugins misbehaves and never lets the event to complete so that's one downside but I don't think that's a big concern. Also there would be a behavior change in #getPlayer, #getAllPlayers and #getPlayerCount which can be worked around if that's necessary to keep as is.

But yes, the behavior would be correct with the disconnect fix patch for this scenario. Its not really perfect but its better than nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants