Checking if Elo system is oppressive [With proofs]

Sort:
xtreme2020
But technically yes it’s unfair if you play players that much lower rated, which isn’t what is going on here.
basketstorm
xtreme2020 wrote:
#20 a 50% win rate is needed for a stave rating, actually slightly less counting in draws. Two of your players had a “hidden elo” of around 1250, but a real elo of 500 and 1000. That means ChatGPT calculated something wrong, as assuming your “hidden elo” is a better calculation of the real strength, the 500 player would have had to lose 50% of his games against players that a player of similar “hidden elo” would almost never lose to. This is impossible. Therefore, the hidden elo is less accurate than the real elo.

No, simulation wasn't that stupid, I've checked the code. ChatGPT didn't calculate anything, it made a program to run the simulation. Outcome was simulated for each game using probabilities according to hidden strength of players. You trust in rated Elo too much. Just think about the initial situation: all players are rated the same when they sign up. How accurate is that real Elo then? How can you grow or decrease Elo of a player if he competes against players with innacurate Elo? That's the problem.

xtreme2020
#23 also, what you don’t realize is that if you have a stable elo, that is your skill level. There is never a hidden skill level, unless you’ve recently improved or gotten worse and haven’t reached your actual elo yet. Your skill level can always be quantified with an elo number, and throughout enough games you will get to that elo number, all the time.
basketstorm
xtreme2020 wrote:
#23 also, what you don’t realize is that if you have a stable elo, that is your skill level. There is never a hidden skill level, unless you’ve recently improved or gotten worse and haven’t reached your actual elo yet. Your skill level can always be quantified with an elo number, and throughout enough games you will get to that elo number, all the time.

If only all Elos were correct in the first place. But if you are rated 600 and competing against freshly-joined 200 who is stronger than you actually and you lose to him and you lose a good amount of Elo because of that because the system treats his 200 as "real", that's not even close to a quantification of someone's skill. That's just chaos.

xtreme2020
#32 if you’ve just signed up or haven’t played in a while, your rating is inaccurate. Otherwise, you beat 50% of the people at your skill level. That’s the definition of being at that skill level, I assume you agree this. Therefore, after playing enough games on chess.com, you’ll get to the point where you have a 50% win rate and aren’t going up or down on average, unless you’re improving.
xtreme2020
#34 you gain and lose less by playing against new players, and most players you play aren’t like that, so it’ll balance out eventually despite a few mistaken losses (and probably some mistaken wins)
basketstorm

Beating 50% of the people at your skill is ideal situation on paper but it doesn't happen, you rarely encounter such people. To encounter such people you need accurate Elos for: 1) you 2) opponent. But because all fresh accounts get exactly the same artificially low Elo (200 or 400 in Rapid) there's no way to do accurate pairing, pairing by actual skill. Situation will not improve even after 1000 games because real skill of players fluctuates that much in low-Elo zone. You need to be a really strong player to break through.

Also because there's a limit (100 Elo), and someone grows after beating you while you hit the limit, that's pure inflation. Sum of all ratings in the system becoming greater.

basketstorm
xtreme2020 wrote:
#34 you gain and lose less by playing against new players, and most players you play aren’t like that, so it’ll balance out eventually despite a few mistaken losses (and probably some mistaken wins)

That's not exactly how Glicko works. It's not about being new or old player, the system tracks your inactivity and increases deviation. But while inactive player is penalized for the loss more, you don't gain less than from beating an active player.

basketstorm
chesssblackbelt wrote:

i checked my win % against players my own rating over thousands of games once

it was 47% wins, 6% draws, 47% losses

elo is crazy accurate

Elo on average is accurate especially for skilled players. But not in low-Elo zone where actual skill is random and does not correspond to the rating well.

xtreme2020
#39 again, there aren’t enough very very new people for that to really affect you. Their elo will be drastically inaccurate for around 5 games, at which it’ll be reasonably accurate, enough to give you a decent chance. The vast majority of games played are played by players with more than 5 games total played, if you get what I’m saying.
basketstorm
xtreme2020 wrote:
#39 again, there aren’t enough very very new people for that to really affect you. Their elo will be drastically inaccurate for around 5 games, at which it’ll be reasonably accurate, enough to give you a decent chance. The vast majority of games played are played by players with more than 5 games total played, if you get what I’m saying.

You don't need very very new. People can keep that skill-rating mismatch for years, I've seen it, I've met such players.

xtreme2020
Looking at your last 10 games, 9/10 were played against people with over 50 (only blitz, not even counting other time controls), and the one was someone who played 30+ games, enough to get him very close. Even if we assume 1/10 of your games are a completely unfair loss with no chance for you, you’ll still only be ~80 elo below your real rating after a thousand games, and in a thousand games almost everyone improves more than 80 rating points.
xtreme2020
#45 and why don’t you take a crack at explaining how, considering you win 50% of your games at your skill level, as that is the definition of you being at that skill level
xtreme2020
And in #46 that’s only until you get out of the zone of new accounts, at which point no one has a crazy rating mismatch and your rating jumps up to its actual skill level
IndianCamels
basketstorm wrote:

Using ChatGPT powers I simulated 1000000 chess games in a pool of 1000 players. Pairing was rating based with small diffusion to emulate online presence factor. Win/loss factor - just like prescribed by Elo. All players had hidden strength in Elo: 90% of players - from 1000 to 1400, 10% players - from 1400 to 2800. Initial rating was 200, rating floor - 100.

Graphs:

Blue: initial strength distribution.

Green: rating after simulation show that the largest group is minimal-Elo players. Mid-Elo group received artificial bump despite the fact that strength of players was constant during simulation!

Full table with data for each player (names are all fake based on names of real great players and names repeat but that doesn't matter because each player has unique id):
https://pastebin.com/raw/JqGKun3K

Conclusion:

best of the best climbed to the top easily.
Low elo players unfairly end up in a various rating ranges, apparently because of luck, not because of lack of skill. And now you can't blame virtual players for lack of skill. Because game result was dictated by their actual hidden strength.

So in the end we have cases like:

id player_name hidden_strength_Elo final_rating_Elo 176 Magnus Portisch 1097 509 468 Vladimir Svidler 1263 497 571 Sergey Short 1239 1042

That means actual strength could be 1200, but rating could be 500 OR 1000.

Or look at this oppressed guy:

id player_name hidden_strength_Elo final_rating_Elo 467 Boris Nepomniachtchi 1203 355




With strength 1203, his rating is 355.
Each player here played 1000 games!

Some more oppression:

id player_name hidden_strength_Elo final_rating_Elo 266 Hikaru Gajdosko 1112 100 322 Magnus Capablanca 1003 132

Magnus is weaker than Gajdosko but Gajdosko is stuck at 100. Is this fair?

This all aligns with my observations and experience here on chess.com and explains why many people astonished by randomness in apparent strength of their opponents that have same rating.

Thoughts?

First of all, we need to take into account that this doesn't take real life in. Second, you are basically reinventing the skill range by making everyone the same rating. It's not possible for one to become 2800 if the starting rating is 1000, unless you're using crazy scales. Thus, your experiment has a bunch of logical errors. Calling the rating system oppressive would also be false due to large amounts of proof that rating systems heavily over-reward overperformance, and do not heavily hurt people for underperformance. Don't use chatGPT for math unless you want bad answers. Instead, get a simulation of real people.

IndianCamels

There are more reasons why this is inaccurate. If we were using elo, the higher the rating of the player, the less players of his rating, the less points he gets per game. This is why it is so hard for the top 10 to hit 2900. Simply put, they would have to win many chess games in a row.

basketstorm
xtreme2020 wrote:
And in #46 that’s only until you get out of the zone of new accounts, at which point no one has a crazy rating mismatch and your rating jumps up to its actual skill level

So you finally admit there's crazy rating mismatch? But why do you think that this unfair zone ever ends somewhere except for the very tops? Simulation data suggests that there's no linearity in actual strength vs rating difference even for >2000 Elo.

As for my games, let's not talk about my games. This is a global problem, not mine. All my points stand strong, your denial based on beliefs, my claims are based on math, realistic simulation data and backed up with real experience of players (see the frequent "game is rigged" topics, there's no wave without wind). And it's enough to have just common logic to understand that such rating system where everyone starts at 200, have floor at 100 and rating changes applied like if every rating is real - can't be fair.

@IndianCamels

I didn't make everyone the same rating. In my simulation no one became 2800 from 1000.

IndianCamels wrote:
Don't use chatGPT for math unless you want bad answers. Instead, get a simulation of real people.

I didn't use it for math. I provided the math. I proof-read his simulation project. All is good. You don't even need a simulation, just a paper and pencil, try thinking about Elo rating system, make some calculations and you'll quickly realize that it's not going to work fairly. It has to be oppressive, there's no other way.

Knowing all the formulas it is trivial to simulate rating flow of any player base very realistically with various parameters and see where it fails. That's not rocket science.

basketstorm
IndianCamels wrote:

There are more reasons why this is inaccurate. If we were using elo, the higher the rating of the player, the less players of his rating, the less points he gets per game. This is why it is so hard for the top 10 to hit 2900. Simply put, they would have to win many chess games in a row.

That's not how it works. You don't get less points per game just because there are not enough players in your rating range.

IndianCamels
basketstorm wrote:
IndianCamels wrote:

There are more reasons why this is inaccurate. If we were using elo, the higher the rating of the player, the less players of his rating, the less points he gets per game. This is why it is so hard for the top 10 to hit 2900. Simply put, they would have to win many chess games in a row.

That's not how it works. You don't get less points per game just because there are not enough players in your rating range.

Look up the calculation for elo and then tell me.

IndianCamels
basketstorm wrote:
xtreme2020 wrote:
And in #46 that’s only until you get out of the zone of new accounts, at which point no one has a crazy rating mismatch and your rating jumps up to its actual skill level

So you finally admit there's crazy rating mismatch? But why do you think that this unfair zone ever ends somewhere except for the very tops? Simulation data suggests that there's no linearity in actual strength vs rating difference even for >2000 Elo.

As for my games, let's not talk about my games. This is a global problem, not mine. All my points stand strong, your denial based on beliefs, my claims are based on math, realistic simulation data and backed up with real experience of players (see the frequent "game is rigged" topics, there's no wave without wind). And it's enough to have just common logic to understand that such rating system where everyone starts at 200, have floor at 100 and rating changes applied like if every rating is real - can't be fair.

@IndianCamels

I didn't make everyone the same rating. In my simulation no one became 2800 from 1000.

IndianCamels wrote:
Don't use chatGPT for math unless you want bad answers. Instead, get a simulation of real people.

I didn't use it for math. I provided the math. I proof-read his simulation project. All is good. You don't even need a simulation, just a paper and pencil, try thinking about Elo rating system, make some calculations and you'll quickly realize that it's not going to work fairly. It has to be oppressive, there's no other way.

Knowing all the formulas it is trivial to simulate rating flow of any player base very realistically with various parameters and see where it fails. That's not rocket science.

Elo uses normal probability to determine how much elo you get. Ex: You have 85% chance of beating a player 400 points less than you