Checking if Elo system is oppressive [With proofs]

Sort:
basketstorm
chesssblackbelt wrote:
RandomChessPlayer62 wrote:

Also, maybe people are stuck at low elo in simulation because there isn't enough elo in the system?

They are stuck because chess is one of the most played games in the world and it's difficult to improve in because it's so competitive

Your message is simply dismissive, it ignores the problem, that's not a way to deal with this issue (to dismiss, to laugh, to compare it to conspiracy etc). Read the conclusion in my previous post, at least the conclusion, it's easy to understand and it has very strong points.

GooseChess
basketstorm wrote:

The data is not coming from the language model itself. It's a result of actual program execution. So it is indeed a "custom written program" in this case.

And I think, you overestimate the complexity of this task.

Can you share the code?

RandomChessPlayer62

For an accurate simulation, we don't just have to account for skill, we have to account for the changing of skill, good days, bad days, and other multipliers such as feelings and focus changing on winning/losing streaks. People aren't just numbers, and I've mentioned rating deflation but I haven't explained it's main cause and why it isn't existent in reality. In your simulation there are a finite number of players with 200 elo that then play against eachother with their ratings changing over time. If there wasn't deflation then your graphs would just look ridiculous. There's rating deflation because even if every player's skill was your set minimum, 1000, we'd need 1800000 more Elo (if I got my math right) in the system, as you have mentioned there is some extra added elo at the floor but it's too small to fix 1800000 Elo, what is required is lots of new players added every tick. Also, as my previous messages have called out, you assume complete beginners don't exist for your hidden skill possibilities, which would probably affect the simulation. Beginners, real people, and new players frequently joining all happen in real chess while they do not in your simulation

basketstorm

Think about this: if two players, 200-rated initially, with 100 as rating floor play 1000 games against each other, with K=16 like here on chess.com, is there any chance their final rating will be similar if they have equal strength and win/rate is 50% with +/-0.1% error? You would think: "of course, maybe small deviation, but yeah, ratings should be pretty close". Wrong. 170 vs 230, 214 vs 182, 151 vs 248 - these are possible final ratings even after 1000 games. This is like +/-15% prediction error. 35, 50, 65% - who knows. So inaccurate, even for equally strong players.

With 1500 as starting rating, however, prediction inaccuracy is much smaller in this simulation (~1%).

But the issue is not only about starting ratings, localized graphs create distortions in the global fabric of Elo ratings.

basketstorm
RandomChessPlayer62 wrote:

For an accurate simulation, we don't just have to account for skill, we have to account for the changing of skill, good days, bad days, and other multipliers such as feelings and focus changing on winning/losing streaks.

Additional variables don't solve described fundamental flaws of the system.

200 Elo as start, 100 Elo as floor, +/-8 Elo as rating change when playing roughly equally rated opponent. Deal with it. Sort people by skill with such input. You can't. Only those who are significantly stronger will move into more or less accurate tier. The rest would be squeezed in thigh chaotic range for years. No need to use any simulations to understand this. And yes, I tried introducing more 200-Elo players later, when the pool is formed. Doesn't change anything, because there are still low-Elo players with significant randomness in the skill.

RandomChessPlayer62
basketstorm wrote:

Think about this: if two players, 200-rated initially, with 100 as rating floor play 1000 games against each other, with K=16 like here on chess.com, is there any chance their final rating will be similar if they have equal strength and win/rate is 50% with +/-0.1% error? You would think: "of course, maybe small deviation, but yeah, ratings should be pretty close". Wrong. 170 vs 230, 214 vs 182, 151 vs 248 - these are possible final ratings even after 1000 games. This is like +/-15% prediction error. 35, 50, 65% - who knows. So inaccurate, even for equally strong players.

With 1500 as starting rating, however, prediction inaccuracy is much smaller in this simulation (~1%).

But the issue is not only about starting ratings, localized graphs create distortions in the global fabric of Elo ratings.

what do we count as close ratings? I say 60 Elo difference is close, even 100 Elo difference.

basketstorm

100 Elo difference is ~14% shift in winning chances, 60 Elo - ~8.5% shift. Just because 60 or even 100 seems like a small number (compared to what, a million?), doesn't mean it is something that can be neglected when we measure player's performance.

Alexeivich94
basketstorm wrote:

100 Elo difference is ~14% shift in winning chances, 60 Elo - ~8.5% shift. Just because 60 or even 100 seems like a small number (compared to what, a million?), doesn't mean it is something that can be neglected when we measure player's performance.

Your simulation had players with actual strength ~1000 ending up at near starting point elo. There is no way the math or the parameters check out for that script you used.

basketstorm
Alexeivich94 wrote:
basketstorm wrote:

100 Elo difference is ~14% shift in winning chances, 60 Elo - ~8.5% shift. Just because 60 or even 100 seems like a small number (compared to what, a million?), doesn't mean it is something that can be neglected when we measure player's performance.

Your simulation had players with actual strength ~1000 ending up at near starting point elo. There is no way the math or the parameters check out for that script you used.

You've misunderstood the math and with your comment you demonstrate a common misconception: assumption that Elo rating system "somehow" "magically" learns about some absolute value, some true player's strength. You'd expect all ratings to grow leaving starting Elo behind? That is not possible. And it's important to understand that Elo is relative. You'd have to keep extending the room on both sides to keep sorting all the players by their skill. This does not happen on chess.com.

And 1000 strength in the simulation was actually the starting strength. So it's normal and expected that the weakest (1000 strength) players received mostly low ratings. It's not normal however, that noticeably stronger players also remained in that low-Elo category. It's not a flaw of simulation because we see identical situation in real life here on chess.com: apparent strength of low-Elo players differs significantly.

The point is: in a fair system, strength difference in Elo must be reflected in the rating difference. Here it is simply not possible because of severe localisation of graphs that is inevitable in player base of such size. Low starting Elo and rating floor contribute to the problem.

IndianCamels

Strength is a very badly used term. Chess strength varies. For example, I am around 1500 blitz, but 2000+ rapid. You start low because low elo = less skill. If you started high, then you would get pushed down by higher rated players. Ratings are low and high because of relativity. Also, FIDE does NOT use rating floors. It is reflected. 1800's will beat 1400's most of the time, 400's will lose to 2000's most of the time... One key thing: There are always upsets. People can change based on their environment and energy. Just because you keep losing to low-rated players, doesn't mean the system is oppressive and actively trying to bring you down. We ALL use the same system. Math doesn't discriminate.

IndianCamels

Also, GRAPHS can lie.

IndianCamels

The definition of oppressive: unjustly inflicting hardship and constraint, especially on a minority or other subordinate group

It's not oppressive.

creepingdeath1974

Hello everyone. I was doing a bit of reading in this article, and I find that I have some thoughts about the way people seem to play the time controls. I have noticed that people playing chess on this site, play long time controls; aka slow chess, they play making their moves like they are playing a speed chess style of game (1 minute, 3minute, 5 minute, and so on), but play speed chess, like they are playing a slow chess style of game ( 30 minute, 60 minute and so on). I am definitely no statisticion expert or anything, nor am I an IM or GM either, but I do love playing this game called chess and the way I see it, while most others are playing the game for their own specific realization of their chess goals, but if anybody who plays this game who doesn't make every attempt to play for the checkmate wins in every one of their games, they are cheating themselves while trying to cheat others in the process. Unfairly at that. I find though also that by playing bullet games, I leave myself with no excuse for not playing my games for the checkmate wins as opposed to the lesser ways of winning games. Who likes to actually play ftheir bullet games for the time win, just to lose on time? Is it not better to win games by resignation than by your opponent flagging? I believe that the checkmate wins are more valuable; and is it or is it not the objective of the game of chess to win by checkmate? Why doesn't anybody talk about how since this is an online chess site that nobody knows who is actually playing fairly according to chess.com's fair play policy and who is not? Especially compared to actual OTB chess games where it is truly easier to be able to oversee the behavior of the players involved, because it is face to face and in person. I figure if people on this site want to play this game online, we all should remember that online chess play is supposed to be played the same way as in OTB play. With chess being played online more now than ever before, it is much easier for people to be able to implement computer software(chess engines). This is nothing new now, but imagine if online chess was around 40 years ago or more like it is in today's age..... at least I know that if chess.com and other chess related organizations happened to exist back then like it is today, perhaps, I could have been the next United States American world chess champion after Bobby Fischer. Only God knows really, but I am not trying to overcomplicate my understanding of chess, or diminish others' understanding either, but to hopefully provide some feedback and my own personal comments. I would like to personally say though that whether anyone agrees or even disagrees in whole or in part, I respect that, and also personally, if people want to play their games in a slow thinking/moving manner of pieces, then they should play a long, slow time control game. If they want to play games in a speedy thinking/moving manner of pieces, then there is the option of play speed chess. In my own experience the only thing that really makes sense to me is that if people play bullet chess for example, then there is no excuse for anybody to play slowly; let alone not take the opportunity to win their bullet games by checkmate.

basketstorm
IndianCamels wrote:

Strength is a very badly used term. Chess strength varies. For example, I am around 1500 blitz, but 2000+ rapid. You start low because low elo = less skill. If you started high, then you would get pushed down by higher rated players. Ratings are low and high because of relativity. Also, FIDE does NOT use rating floors. It is reflected. 1800's will beat 1400's most of the time, 400's will lose to 2000's most of the time... One key thing: There are always upsets. People can change based on their environment and energy. Just because you keep losing to low-rated players, doesn't mean the system is oppressive and actively trying to bring you down. We ALL use the same system. Math doesn't discriminate.

Thanks for your comment, dear friend. But your comment reflects some common misconceptions about ratings, skill, and the way Elo systems work. Let’s break down the claims and address where they go wrong:

1. Misunderstanding of "Strength" as a Concept

  • Debunking: The comment suggests that "strength" is badly used because the individual has different ratings in different time controls (1500 in blitz and 2000+ in rapid). However, this doesn't reflect a misunderstanding of "strength" as a concept—it shows that strength is time-control specific. In chess, strength varies across different formats due to the different skill sets required (quick decision-making in blitz vs. more calculated play in rapid). This does not invalidate the concept of strength; it simply highlights that strength is context-dependent.

Strength in chess refers to a player's ability to win games against others in a given format or time control. Just because someone’s rapid rating is significantly higher than their blitz rating doesn’t mean the term "strength" is misused—it simply means they are stronger in one format than another.

2. "Low Elo = Less Skill" and Initial Placement

  • Debunking: The comment says, "you start low because low Elo = less skill" and implies that starting low is natural because higher-rated players will push you down if you start high. This is not entirely accurate. In most Elo systems, starting low is a design choice, not necessarily an inherent reflection of skill.

The starting rating is arbitrary and is set by the system (for example, FIDE's starting rating for new players is typically around 1000-1200). A player’s rating only becomes a reflection of their skill after they’ve played a sufficient number of games and the system has adjusted to their true level. Starting everyone at a low Elo allows the system to avoid overestimating new players' strength, but it’s not a fundamental rule of Elo systems that “low Elo = less skill.”

A player’s starting rating could just as easily be set at a higher value, and they would still converge to their correct rating through wins or losses. The notion that starting low is necessary because high-rated players would push you down is true in practice but only because the system needs time to adjust to actual performance, not because starting low is inherently tied to less skill.

3. Relativity of Ratings

  • Debunking: While it’s true that ratings are relative, the comment implies that ratings naturally find their place based solely on results over time. However, the quality of the matchmaking process (i.e., who you are paired with) plays a significant role in how quickly your rating stabilizes to reflect your true skill. For instance, if a player is frequently paired with others whose ratings are far from their own, their rating changes may not accurately reflect their actual skill level for a long time. In systems with many players or fragmented matchmaking, relative ratings can get distorted.

Ratings can reflect skill more accurately in tightly connected systems where players regularly face opponents from a wide range of skill levels. In larger, fragmented systems, ratings may become inflated or deflated due to limited crossover between skill brackets.

4. "FIDE Does NOT Use Rating Floors"

  • Debunking: This point is misleading. While FIDE doesn’t impose formal "rating floors" the way some other rating systems (like the United States Chess Federation, USCF) might, there are practical rating floors in place. FIDE requires players to reach a rating of 1000 before they get a published rating. Ratings below 1000 are not recorded in official databases.

In effect, this does impose a floor on rating visibility, even if it’s not a technical rating floor. Additionally, once players are over 1000, there’s no minimum threshold below which they can’t drop. However, FIDE’s starting rating and lack of a formal floor make it a more gradual system, unlike others that might cap how low a rating can go.

5. "1800's Beat 1400's Most of the Time" and Win Probability

  • Debunking: This is broadly correct—Elo is designed to predict win probabilities, and higher-rated players will beat lower-rated players more often. However, the comment oversimplifies the situation by ignoring factors such as rating distortions, localized pools, or inflated/deflated ratings in certain regions or communities.

In large pools with limited crossover, ratings might not reflect true skill accurately, meaning a 1800-rated player from one isolated pool may not be as strong as a 1800-rated player from another pool. The same applies to 1400-rated players. The math behind Elo systems doesn’t account for these external distortions, and it can create a sense of unpredictability when players from different pools meet.

Additionally, upsets and variance are a key part of Elo, as noted, but this doesn’t mean the system is immune to external problems like insufficient game data, fragmented player bases, or mismatched pairings.

6. "Math Doesn't Discriminate"

  • Debunking: While it's true that "math doesn't discriminate" in a literal sense, this statement overlooks that the system's design can unintentionally create biases. The Elo system, or any rating system, is only as good as the data and conditions it is applied to. Poor matchmaking algorithms, large player pools with fragmented interaction, or localized inflations can lead to unintended biases or imbalances in the ratings system.

For example:

  • Players who live in regions with weaker pools can have inflated ratings, and when they compete with players from stronger regions, their rating may be deflated quickly.
  • The system might "discriminate" by failing to accurately reflect improvement or decline if players are only matched with opponents from the same localized pool.

So while the mathematical calculations of Elo are impartial, the system itself can be skewed by external factors.

Conclusion:

The comment oversimplifies the way Elo and ratings work in complex, large-scale environments. While Elo is a useful tool for predicting win probabilities, it has limitations when applied to large, fragmented pools, and ratings can become distorted if the system isn't properly managed. Key points like the importance of player crossover, the accuracy of win probabilities, and environmental factors like player base size and distribution are missing from the comment. Moreover, while math is impartial, the application of the system can introduce biases and distortions that affect the fairness of the outcomes.

IndianCamels

They don't have pools based on region. I've played people from Libya, Saudi Arabia, South Africa, India, America, Canada

IndianCamels

The vast majority of players do not have localized pools. In fact, localized pools produce higher ratings

basketstorm
chesssblackbelt wrote:

i told ai to make the above comment shorter if anyone wants to read the short version

The author observes that players on the chess site often mismatch their time controls, playing slow games with a fast-paced mindset and vice versa. They emphasize the importance of striving for checkmate wins rather than relying on time victories, arguing that true success in chess should focus on achieving checkmate. The author also highlights concerns about fair play in online chess compared to over-the-board games, suggesting that online play should mirror traditional standards. Ultimately, they advocate for players to choose their time controls based on their preferred pace of play.

Yes that could be a problem, easy to track if you count losses by timeout vs wins by opponent's timeout. That could tell if you're faster than your opponents.

Looking at my bullet stats, you could tell I play with hyper-bullet mindset while my opponents more tend to Blitz mindset:

IndianCamels

"A bad craftsman blames his tools"

basketstorm
IndianCamels wrote:

They don't have pools based on region. I've played people from Libya, Saudi Arabia, South Africa, India, America, Canada

Easy: timezone difference, people need to sleep, people are more productive when they're awake not sleepy etc, that's how you get into different pools here on chess.com. Look up stats against different countries btw, if you played a lot you'll see certain patterns like one country might appear stronger in general.

IndianCamels

Country of origin has nothing to do with skill. That definitely sounds bad. Don't make generalizations.