Checking if Elo system is oppressive [With proofs] - Chess Forums - Page 6

basketstorm · 2024-09-11T14:43:02-07:00

UPD to whoever is reading this: start reading from the last page and go back as I reveal proofs that are more substantial and easier to understand there. Below simulation was just a topic-starter. You'll find more revelation on further pages. Using ChatGPT powers I simulated 1000000 chess games

IndianCamels

Sep 12, 2024

0

#101

This is how I practice for my AP Stats test

basketstorm

Sep 12, 2024

0

#102

IndianCamels wrote:

Your input info is correct, but you can't use chatGPT to run it.

You can. At least in paid version. And if it fails to run computations on cloud it would offer instructions how to run these computations on your computer.

IndianCamels wrote:

Also, why would you say elo is oppressive. Are you saying that you and a 2000 on chess.com are the same skill, but the elo system is working against you, making sure you don't succeed, and getting good is nothing but a joke and all Grandmasters are lazy?

No, I'm not saying that me and 2000 on chess.com are the same skill but the elo system is working against me. It is working against other players. My Blitz Elo should be negative or at least less than 100. Same for many other players. That way we would have no inflation and fair rankings: someone is stronger, someone is weaker. Not like now: a broad range of variously skilled players is squeezed into a tight low-Elo range.
And the rating uncertainty must not reduce so quickly. When I'm losing to a player with high RD I still lose 8 points, he gains like 24. It's ok for him to gain 24, maybe. But for me to lose 8 like his rating was as accurate as mine? Not fair. And many other examples like with the rating floor issue I already provided in other posts.

basketstorm

Sep 12, 2024

0

#103

IndianCamels wrote:

Also, if you did an actual experiment, you would get a bell curve in your histogram. If you look at the actual placement of chess.com ratings, you would realize they are just a skewed right bell curve. Describing the data accurately doesn't matter if the data is wrong. Compare your rating distribution to an actual rating distribution(go to rapid stats, global), mean rating is 620, and you'll see your model is terribly innaccurate/

I explained why leaderboard on chess.com looks like this. Peak on that "bell" curve is the initial rating assigned to many players who didn't play enough games to change it significantly. In my simulation everyone played 1000 games hence no peak there

basketstorm

Sep 12, 2024

0

#104

llama_l wrote:

IndianCamels wrote:

This is how I practice for my AP Stats test

Just a reminder to the OP, don't use chatGPT to do school math kids

I am familiar with the desmos and wolfram. However, I didn't use chatGPT just to do the "math" like I've already explained here.

basketstorm

Sep 12, 2024

0

#105

llama_l wrote:

basketstorm wrote:

llama_l wrote:

IndianCamels wrote:

This is how I practice for my AP Stats test

Just a reminder to the OP, don't use chatGPT to do school math kids

I am familiar with the desmos and wolfram. However, I didn't use chatGPT just to do the "math" like I've already explained here.

Well, you can test it in reality. Have a player rated whatever create a new account and see if they get stuck at something a lot lower.

I've made over 100 accounts in my time (which began before chess.com existed). I've never had an issue going to my correct rating quickly.

I think there's certain threshold in strength level after which it's easier to sort of "break through" the gates of Elo oppression. But below that threshold there are many levels. And their level is ignored by the system, they aren't ranked properly because of the reasons I repeated many times. From 100(important: clipped at 100) to 200 with 8 as increment it's impossible to fit a broad range of skill accurately. No matter how many games these players play. And it's easy to simulate.

I understand that players can have bad/good days etc. That shouldn't be an excuse in the chess world. If you have more bad days compared to good days, for the others your skill is just dropped. For the simulation I'm assuming good/bad days average.

IndianCamels

Sep 12, 2024

0

#106

chesssblackbelt wrote:

If ratings are accurate then how comes janko isn't 2900????

With janko on his side, the OP has much more people to fend off now.

basketstorm

Sep 12, 2024

0

#107

llama_l wrote:

1 day old accounts, but I've never seen an example of that actually happening.

I've seen low Elo players (~150) that play here for months or even years. And they play much much better than me but paired with me because I'm 150 or so. At the same time some ~200ish play worse than me.

basketstorm

Sep 12, 2024

0

#108

llama_l wrote:

Fun rating trivia...

You and an opponent have the same rating and will play two games.

Is it better for your rating if you win the first game and lose the second? Or the other way around? Or it doesn't matter which order you win/lose?

On chess.com rating change value is rounded or something like that.

If you both 200, after first won game it would be 208, 192
if you win next game you should be awarded with 7.6318 points but on chess.com it's likely rounded to 8 again. if you lose, you should be losing -8.3682 points but due to rounding you lose only -8 again.

So in non-rounded system if you first win, then lose, you'll be 199.632 and if you first lose then win you'll be 200.368.

MasterJyanM

Sep 12, 2024

0

#109

basketstorm wrote:

RandomChessPlayer62 wrote:

800s aren't matched with 200s

Happens often in tournaments.

yea,350 against 1100

MasterJyanM

Sep 12, 2024

0

#110

jankogajdoskoLEM wrote:

chesssblackbelt wrote:

If ratings are accurate then how comes janko isn't 2900????

ELo gatekeeping my dear friend, I was 2311 Fide and yet I cant get Atlast that here? Proposterous!!

we dont need this again

RandomChessPlayer62

Sep 12, 2024

0

#111

How did basketstorm get what they say is an accurate ChatGPT, I ask ChatGPT for code for a simple program in python and ChatGPT's answer is a mix of outdated information and information about different programming languages

MasterJyanM

Sep 12, 2024

0

#112

RandomChessPlayer62 wrote:

How did basketstorm get what they say is an accurate ChatGPT, I ask ChatGPT for code for a simple program in python and ChatGPT's answer is a mix of outdated information and information about different programming languages

use a ChatGPT 100.0 and it still wont work LOL

MasterJyanM

Sep 12, 2024

0

#113

MasterJyanM wrote:

RandomChessPlayer62 wrote:

How did basketstorm get what they say is an accurate ChatGPT, I ask ChatGPT for code for a simple program in python and ChatGPT's answer is a mix of outdated information and information about different programming languages

use a ChatGPT 100.0 and it still wont work LOL

what about Javascript?

basketstorm

Sep 12, 2024

0

#114

RandomChessPlayer62 wrote:

How did basketstorm get what they say is an accurate ChatGPT, I ask ChatGPT for code for a simple program in python and ChatGPT's answer is a mix of outdated information and information about different programming languages

Did you use paid ChatGPT or free/limited? What was your request message?

RandomChessPlayer62

Sep 12, 2024

0

#115

basketstorm wrote:

RandomChessPlayer62 wrote:

How did basketstorm get what they say is an accurate ChatGPT, I ask ChatGPT for code for a simple program in python and ChatGPT's answer is a mix of outdated information and information about different programming languages

Did you use paid ChatGPT or free/limited? What was your request message?

Free. Don't remember request message but it included the version of python I was using

basketstorm

Sep 12, 2024

0

#116

Idk, can't say too much without seeing the actual request, in my experience it's all about the request, bad request = bad response. And of course the free version is much weaker but it should let you use premium couple of times per day.

RandomChessPlayer62

Sep 13, 2024

0

#117

Also, maybe people are stuck at low elo in simulation because there isn't enough elo in the system?

RandomChessPlayer62

Sep 13, 2024

0

#118

Also, I would like the code for your simulation.

basketstorm

Sep 13, 2024

0

#119

Not everyone is stuck, strong player gain lots of Elo, so the system has it.
But that doesn't matter since low-Elo players exchange with each other in a localized pool, plus there's an "infinite" Elo-generation at the floor (100): play a game with 104-rated as 100-rated, gain +8, he loses -4, so +4 Elo added to the system. In such case, the floor is the generator of Elo, not the initial rating. On lichess you have 1500 as initial rating and that serves more as the source of the Elo.

This is very simple, I don't get why people can't see it, you don't even need simulations, the real situation on chess.com is the best "simulation" and proof. With such parameters (200 as Starting Elo, RD that stabilizes just after few games and so you get +/-4% change (which is huge) for your rating from each game when you are around 200), it's impossible to "sort" players by skill under a certain strength threshold. Simply not enough precision. A good portion of skill range becomes squeezed into a narrow band of ranges and inside players are sorted rather chaotically. There's no "room" to distribute players by their skill-level.

It's not about rigging conspiracy or something like that although from player's perspective the game does look rigged because they keep meeting identically-rated opponents but with wildly random strength.

Many commenters here seem to have a belief that Elo rating system is "somehow", "magically" will sort things out on it's own because "that Alpard guy was smart" or because "conspiracies are stupid". But I think we should look deeper and search for stronger points, we should stop relying on beliefs. "Conspiracy is stupid = Elo rating system for online chess is fair"? No. There's logical mistake here, there's no connection between these two statements. You can't use conspiracy topic to justify accuracy of Elo rating system, that would be just as serious as the conspiracy itself.

After reading my comment, ChatGPT agrees, please, read at least the conclusion:

You've touched on a significant issue with rating systems, particularly when dealing with a large number of players who have similar skill levels. Let's delve deeper into the points you raised:

1. Initial Conditions and Rating Dynamics:

Starting all players at a similar initial rating (e.g., 200 Elo) and having a rapidly stabilizing Rating Deviation (RD) means that the system can quickly converge to a seemingly stable rating, but this stability is somewhat superficial. The initial burst of games might not be enough to accurately capture the true skill levels of the players, especially if the number of games required for stabilization is relatively low.

2. Precision and Granularity:

With a narrow rating band and limited precision in rating changes (like +/- 4% per game), small differences in skill levels get compressed. This results in a lack of granularity where players with slightly different skills end up with very similar ratings. When ratings are clustered too tightly, it becomes difficult for the system to distinguish between different levels of skill accurately.

3. Chaotic Sorting Within Bands:

When players are squeezed into a narrow rating band, the system's ability to accurately sort them by skill diminishes. Small variations in performance, or even short-term fluctuations in form can cause significant rating changes. This can create a perception of randomness or chaos, where similarly rated players exhibit widely varying levels of skill in actual matches.

4. Threshold Effects:

There's a threshold below which the rating system fails to accurately reflect skill differences. Players within this threshold band can have ratings that don't truly represent their abilities, leading to the chaotic sorting you mentioned. This is exacerbated in large player pools with many players falling into this narrow skill range.

5. Improvement and Adaptation:

As players improve or adapt their strategies, their ratings should ideally reflect these changes. However, in a system with limited precision and narrow bands, it takes a longer time for such improvements to be reflected in ratings. This lag can cause frustration and the perception that the system is rigged or unfair.

6. Possible Solutions:

To address these issues, several adjustments can be made to the rating system:

Increase Initial Rating Range: Start players with a wider range of initial ratings to better capture early skill differences.
Adjust Rating Change Magnitude: Implement dynamic rating changes where the magnitude of change depends on the number of games played and the confidence in a player's rating.
Introduce More Granularity: Use a system with more granular rating increments to better differentiate players with similar skill levels.
Regular Recalibration: Periodically recalibrate the ratings of all players to ensure that the system remains accurate over time.

Conclusion:

The issues you've highlighted are inherent in any rating system that lacks sufficient granularity and precision, particularly when dealing with large pools of similarly skilled players. Understanding these limitations is crucial for players and developers alike. While it may seem simple, the complexities of accurately measuring and sorting player skills require continuous refinement and adaptation of the rating system to ensure fair and accurate matchmaking. Elo was originally designed for smaller, more contained environments (like chess tournaments) where players regularly compete against each other, and the system could rely on relatively dense, connected graphs of player interactions. When applied to large-scale systems like online games with global player bases, some critical issues arise, and you've identified them well.
In large-scale environments with fragmented pools, expecting Elo to represent true win probabilities is fundamentally flawed. Without enough crossover between localized groups, the ratings become distorted, leading to inaccurate predictions of outcomes when players from different pools meet. Elo is most effective when there's a dense, connected graph of players, but in massive, globally distributed games, ratings quickly lose their ability to reflect reality. A better approach would involve recognizing these limitations and perhaps using regionally adaptive systems or dynamic recalibration to mitigate the issues of fragmentation and distortion.
Many systems assume that Elo will smoothly scale with a larger player base, but this expectation fails because Elo doesn't adapt well to sparsely connected graphs, nor does it handle fragmented competition well. In reality, win probabilities depend on how well the graph of player interactions is connected, which becomes much more difficult as the pool grows larger.

RandomChessPlayer62

Sep 13, 2024

0

#120

chesssblackbelt wrote:

RandomChessPlayer62 wrote:

Also, maybe people are stuck at low elo in simulation because there isn't enough elo in the system?

They are stuck because chess is one of the most played games in the world and it's difficult to improve in because it's so competitive

I don't disagree, but I was talking about a separate flaw in basketstorm's "proof"