Matchmaking balancing seems broken #975

mankinskin · 2023-09-17T21:39:12Z

I regularly get into really unbalanced games in matchmaking, last game I had was 14% and one before that was 57% balanced. How can this be? Its really annoying when you just lose in a teamgame because of this.

The matchmaking should really only allow games above a specific boundary or at least not allow premade teams to break balance. 14% balance is unaccaptable and just a waste of time.

BlackYps · 2023-09-18T00:29:55Z

Please post some replay ids so we can investigate the issue

mankinskin · 2023-09-18T00:42:29Z

20806391 link 4v4, that is interesting because now it shows 71% (from 14% earlier) after one person has apparently played their first game and won... now he is rated 1092 ... from 0 to 1092 in one game (talking about player "Rigadoon" here)
20805901 link 3v3 74% here my team was actually on the winning side
20805770 link 4v4 57%
20782487 link 4v4 20%
20694871 link 3v3 71%
20784272 link 4v4 51% it seems a lot worse for 4v4 games

you can look for more in my replays: longhead

another problem related to this is when one team has a lot of rating concentrated on one player. it depends on the map if this is an advantage or disadvantage but it almost always feels unbalanced even if the balance rating may be above 90%.

mankinskin · 2023-09-23T16:32:29Z

A way to solve this might be to require people to play 1v1 ladder and use that as a proxy for the tmm ratings when they have 0 tmm games. Then balance below a threshold should not be allowed. And the distance of each player to the average rating should affect the balance rating. When there is one 2000 player and a few 500 players vs a few 1000 players, depending on the map that may be an advantage or a disadvantage. Generally more mexes means the 2000 can expand way more and dominate the 1000 players. few mexes means the 500s will block the mexes and not use their eco as efficiently as the 1000s. Either way it is unbalanced so it should not be rated as well balanced.

BlackYps · 2023-09-25T11:18:55Z

We already use global rating as a baseline to initialize the tmm ratings. That is why the first game you linked seemed to have this rating jump from 0 to 1092. In reality the displayed 0 rating is a visual bug, it was 1092 already, that's why the match launched, because the actual balance was good, even though the game reported it as low, because it falsely got that guy as 0 rated.

And the distance of each player to the average rating should affect the balance rating.

This is already implemented

20805901 I see no issue here. There is less than 500 rating difference between the players, which is generally regarded as fine.
20805770 unbalanced by 150 rating and less than 400 rating difference between the players
20782487 I can see the issuethe ratings of the players are spread out a lot
20694871 same here
20784272 same here
The thing is that we don't know if some of these players are premade, but regardless of that we have a tradeoff between match quality and waiting times. That also explains why it is worse for the 4v4 queue. That one is just a lot less active. On top of that, new players get more lenient matching, so especially in the low rating ranges it can happen that games that are not ideal (at least from the pure numbers) are launching.

mankinskin · 2023-09-25T15:28:23Z

we don't know if some of these players are premade

but that must be possible to determine during balancing?

tradeoff between match quality and waiting times

yes that is obviously an issue, I see at most 20 players in matchmaking queue at a time so an ideal balance will not always be possible.

One issue I think that should be addressed though is that games are rated with high balance when there is a huge disbalance within the teams. Especially on point mirrored maps, where the symmetric spots dont actually play against each other, this can turn out extremely unfair, because the best player of each team plays against the worst of the other team.

In general the balance within the teams should be included into the rating to hopefully get a bit more balanced games.

On top of that it might be a nice feature in the client if you could set your minimum balance rating. That way players could decide how much they are willing to wait for a better game. But that is probably a bigger change.

Also when one team is very unbalanced and the other is balanced, that also creates an unfair setting depending on the map.

BlackYps · 2023-09-25T16:49:50Z

but that must be possible to determine during balancing?

What I am getting at is that a high rated and a low rated player in the same game might be because they were premades.

Trueskill doesn't take rating distribution into account in the way you describe, which is a limitation we have to work with, but in the long term it doesn't matter because these unfairnesses will balance out. Some times you are favoured, sometimes you are not, but in the end you will converge on an average rating that is fitting.

Being able to set a minimum balance rating sounds like a nice feature on the surface, but is unfeasible and also not desireable. This feature has been discussed at length and the conclusion is that it is just not viable.

I know that some matches feel suboptimal, but as I said it is a game of tradeoffs and I feel we are pretty much at a local optimum right now.

mankinskin · 2023-09-25T16:58:17Z

in the long term it doesn't matter because these unfairnesses will balance out

That may be true but the issue is not really getting an inaccurate rating, but having unfair games. Its not fun, no matter which side you are on. An unbalanced game will either be frustrating because you lose or boring because you win easily. Its primarily a game quality issue.

Trueskill doesn't take rating distribution into account in the way you describe, which is a limitation we have to work with

That is unfortunate, because I feel like that is the only issue that could really be improved. It is impossible to know how well someone will play when they have zero games, and when there are no players in queue you can't make a better game. But making unbalanced games because TrueSkill doesn't detect it seems a bit unnecessary. Possibly there are alternative balancing methods which take intra-team balance into account? Otherwise I imagine it could be simple enough to scale the balance rating by some inverse distance of each player in a team from the average rating of all players, so that more "spread out" ratings are discouraged? It seems like an option that could at least be tested.

BlackYps · 2023-09-25T17:16:28Z

I'm not sure if we are talking about the same thing. Do you mean the rating changes after a game or how people get selected for games?

mankinskin · 2023-09-25T18:06:49Z

How people get selected for games. Its not really my issue what rating I end up with, I would just like to be matched into games that are as equally balanced as possible.

BlackYps · 2023-09-25T18:08:26Z

the matchmaker already discourages matches where the ratings are spread out

mankinskin · 2023-09-25T18:09:32Z

Oh really? Okay, because some of these games were still rated very high in terms of balance.

BlackYps · 2023-09-25T18:11:57Z

Ignore the balance percent rating. This is the trueskill balance metric that is not accomodating for rating spread. It is not used by the matchmaker. For the matchmaker we use a custom algorithm to determine how balanced a game is

mankinskin · 2023-09-25T18:15:34Z

aha, understood. Maybe that could be an improvement to show the actual balance rating in the client? 😅

BlackYps · 2023-09-25T18:18:56Z

in theory yes, but not really worth the effort in my opinion. It would be pretty complicated to get the number from the server to the client to the replay. And the single number doesn't tell you that much anyway

mankinskin · 2023-09-25T20:33:56Z

20857269 again a game like this. I have a hard time believing this is discounted. Effectively that was a game of (1300, 1000, 1000) vs (1400, 1200, 1200). Just because one player had a 3v3 rating of 800 he got matched with two 1200s, one of which had a global rating of 1400. The 800 had a global rating of 1200. My teammates were both global and 3v3 1000.

And this was when the queue was actually quite full, right now I see 20 people there.

BlackYps · 2023-09-25T20:49:51Z

When looking at the 3v3 ratings this game looks pretty reasonable. What exactly is your problem there? Of course the numbers add up less when you look at the global ratings. Imagine the 800 dude was a pure astro player. You just can't rely on the global rating, that is why people get initialized with a lower rating in the matchmaker. He gained 44 points for that game, so he will soon be at the rating that fits him better.
Do you have a proposal for a better solution?

mankinskin · 2023-09-25T21:05:13Z

The problem is that a 800 rating was used to balance out two 1200 in one team. But rating doesnt work like that. A team with a 500 and two 1500s is going to be a lot better than a team of three 1200s. When rating is concentrated into one position like that it becomes much more likely that you lose stratecially and tactically. Also the difference in terms of skill between an 800 and a 1000 is not that big compared to the difference between a 1000 and a 1200. The 1200 is a lot better than the 1000 is compared to the 800. Ratings can't just be moved around between players to balance.

I gave my proposal above, make the players of each team have rating as close to their average as possible. Or at least match the slots up so that each slot has similar rating. Especially when the map has uneven distribution of mexes, concentrating a lot of rating on one spot can make a huge difference. In this game one spot had mexes for 3 spots because it was a 5v5 map, but his mirror was a 1200 and he was a 1000, just because some tiny spot in the back was filled with an 800 (who was actually almost 1400 on global).

This happens all the time and its very frustrating because you end up playing against much better teams.

mankinskin · 2023-09-25T21:16:28Z

I think a good model to think of is to match each player against a mirror in the map. As the maps are symmetric, thats usually how it plays out. Then the rating can be used to estimate who wins each encounter. When the slots are not roughly equal then one team will lose on that spot pretty much guaranteed and there is no way to really mitigate that for the other players. So when a non important spot is assigned to a relatively low rated player, the more important spots will be assigned to better players to balance it and they will win the more important positions against a team of more even rating. In turn its also possible for the single good player to be on an unimportant spot and the low rated players lose the important spots simply because the enemy players have more even rating and thus higher rating. The single high rated player can't carry the lower rated players when he is on an unimportant slot.

The only solution really is to match all slots individually and have them as similar as possible, and then the team balance should matter. But simply putting high vs low rated players into mirrored spots is always unbalanced.

BlackYps · 2023-09-25T21:35:40Z

The matchmaking code is here: https://github.com/FAForever/server/blob/develop/server/matchmaker/algorithm/team_matchmaker.py
Iirc the two teams get sorted by rating at some point, so the highest rated should be in opposing slots and then the next highest rated face each other etc.

By definition the skill gap between 800 and 1000 should be the same as 1000 and 1200 even if it might feel different.

It will basically be impossible to quantify the importance of a slot, so I don't think it is feasible to go down that route

mankinskin · 2023-09-25T21:41:59Z

Yes, I looked into it and I saw that there is a measure of rating deviation in the teams which discounts game quality, but I wonder if that discount is strong enough.. perhaps its just a matter of lowering

server/server/config.py

Line 118 in 455912e

self.MAXIMUM_RATING_DEVIATION = 250

further? I feel like 250 rating points can make a huge difference in skill and when this ends up as a slot matchup it can decide a game pretty quickly. Realistically it would probably only be a 125 rating difference at worst because one team would have to be average, but even that is a lot. Maybe a value of 100 is more adequate? If I understand this correctly?

Its used here

server/server/matchmaker/algorithm/team_matchmaker.py

Lines 294 to 301 in 455912e

    
           rating_disparity = abs(match[0].cumulative_rating - match[1].cumulative_rating) 
        
           unfairness = rating_disparity / config.MAXIMUM_RATING_IMBALANCE 
        
           deviation = statistics.pstdev(ratings) 
        
           rating_variety = deviation / config.MAXIMUM_RATING_DEVIATION 
        
           # Visually this creates a cone in the unfairness-rating_variety plane 
        
           # that slowly raises with the time bonuses. 
        
           quality = 1 - sqrt(unfairness ** 2 + rating_variety ** 2) + time_bonus + minority_bonus

BlackYps · 2023-09-25T21:48:37Z

Yes, this is the correct variable, but good luck convincing the community that it should be lowered significantly because doing that would directly lead to an increase in wait times

mankinskin · 2023-09-25T21:56:06Z

but the quality requirements are already being lowered over time.. and this would mainly fix a skewed priority of team rating variety. The rating imbalance is currently prioritized over variety, which leads to exactly what we saw. Equal cumulative ratings but imbalanced teams nonetheless. I don't think it would increase wait times by that much but it should improve the game quality a lot.

If it increases wait times too much maybe the maximum imbalance can be increased. The cumulative ratings aren't that accurate anyways I imagine and currently its

server/server/config.py

Line 116 in 455912e

self.MAXIMUM_RATING_IMBALANCE = 250

In fact, this should probably be depending on how many players are in a team. 250 imbalance is a lot in a 2v2 game but not in a 4v4 game. Maybe it should be expressed as a relative value of the cumulative rating. 250 is a lot for matches of cumulative 2000 rating but not for 4000 cumulative rating..

That might also explain why the matchmaking is worse for 4v4 matches, as the imbalance score is even more strict and the variety is increased.

Edit: actually maybe its not a good idea to have imbalance be calculated relative to the cumulative rating, but probably per player. Because it would probably screw with higher rated games. But expressing a maximum imbalance per spot makes more sense than for the entire team imo.

BlackYps · 2023-09-25T22:00:21Z

When I ran performance tests with artificial data the rating spread was mainly the determining factor for wait time. It is actually pretty easy to distribute multiple players in a way that both teams are almost equal, but it is very hard to find six or eight players of basically the same rating.

mankinskin · 2023-09-25T22:03:00Z

Okay that makes sense.. Personally I would prefer longer wait times over worse quality games.. but I don't know how the community feels about it.

What about the idea of calculating imbalance relative to the number of players in a team? Basically have the imbalance apply to each spot individually so larger teams get more leeway with the imbalance and are more likely to have less variety?

BlackYps · 2023-09-25T22:03:07Z

Similarly it is way easier to mix players into comparable teams when you have four players in a team compared to only two. So while you are technically correct, it doesn't matter in practice that this balance requirement doesn't scale per player

mankinskin · 2023-09-25T22:05:42Z

okay yea thats interesting.. I wonder how that actually works out..

I don't know. I feel like it happens quite frequently that matches turn out with a lot of variety. Maybe this can be investigated a bit more in case someone works on the balancing again.

BlackYps · 2023-09-25T22:05:48Z

You can search the forums for discussions about it. At some point there was a thread active arguing for higher quality matches and at the some time different people argued for less wait times in another thread. Sadly they didn't connect with each other

mankinskin · 2023-09-25T22:08:00Z

Thanks for your help.

BlackYps · 2023-09-25T22:08:04Z

Right now we seem to be at a point where both groups are roughly the same size, so we can't be too far from the optimum. And I have decided to stay a looong way away from further tuning the algorithm because the endless discussions about every single change are just too draining

mankinskin · 2023-09-25T22:09:20Z

So maybe get the config from the client? have users configure their requirements themselves? or offer 2 presets, one for quick matching and one for better quality..

BlackYps · 2023-09-25T22:11:12Z

This has already been discussed at length on the forum as well

mankinskin · 2023-09-26T14:03:56Z

Another idea might be to keep the rating variety in both teams roughly equal. Just had this game 20860454 where one team was very varied and the other more balanced. The team with more variance had 2 noob players and one pro, the noob players kept feeding us and eventually lost their spots. if we had equal variance this would have been the case for both teams.

mankinskin added the bug label Sep 17, 2023

Sheikah45 transferred this issue from FAForever/downlords-faf-client Sep 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matchmaking balancing seems broken #975

Matchmaking balancing seems broken #975

mankinskin commented Sep 17, 2023

BlackYps commented Sep 18, 2023

mankinskin commented Sep 18, 2023 •

edited

mankinskin commented Sep 23, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023 •

edited

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023 •

edited

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023 •

edited

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023 •

edited

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023 •

edited

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 26, 2023

Matchmaking balancing seems broken #975

Matchmaking balancing seems broken #975

Comments

mankinskin commented Sep 17, 2023

BlackYps commented Sep 18, 2023

mankinskin commented Sep 18, 2023 • edited

mankinskin commented Sep 23, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023 • edited

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023 • edited

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023 • edited

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023 • edited

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023 • edited

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 25, 2023

BlackYps commented Sep 25, 2023

mankinskin commented Sep 26, 2023

mankinskin commented Sep 18, 2023 •

edited

mankinskin commented Sep 25, 2023 •

edited

mankinskin commented Sep 25, 2023 •

edited

mankinskin commented Sep 25, 2023 •

edited

mankinskin commented Sep 25, 2023 •

edited

mankinskin commented Sep 25, 2023 •

edited