Checking if Elo system is oppressive [With proofs] - Chess Forums - Page 9

basketstorm · 2024-09-11T14:43:02-07:00

UPD to whoever is reading this: start reading from the last page and go back as I reveal proofs that are more substantial and easier to understand there. Below simulation was just a topic-starter. You'll find more revelation on further pages. Using ChatGPT powers I simulated 1000000 chess games

MasterJyanM

Sep 13, 2024

0

#161

just do a bunch of tourneys

basketstorm

Sep 13, 2024

0

#162

chesssblackbelt wrote:

That doesn't mean the skill level is the same

It takes more skill to increase 100 elo the higher rated you get

Again, we can't talk about how hard it is to increase from 100 Elo because behind 100 Elo here a wide range of skills is hidden.

RandomChessPlayer62

Sep 13, 2024

0

#163

jankogajdoskoLEM wrote:

ELO opression is relentless, Abolish ELO!!! No Elo means Not being stuck and gatekept.

Abolishing Elo means you'll get destroyed by an actual GM.

RandomChessPlayer62

Sep 13, 2024

0

#164

I hate math, logic is much better. If your Elo is stable, you've reached your skill level. Because if you were lower than your skill level your Elo would rise because you'd be facing easier opponents and winning more than losing, if you were higher than your skill level your Elo would drop because your facing harder opponents and losing more than winning. Simple

basketstorm

Sep 13, 2024

0

#165

RandomChessPlayer62 wrote:

I hate math, logic is much better. If your Elo is stable, you've reached your skill level. Because if you were lower than your skill level your Elo would rise because you'd be facing easier opponents and winning more than losing, if you were higher than your skill level your Elo would drop because your facing harder opponents and losing more than winning. Simple

It's not about being lower than your Elo level.

For some people 100ish ratings are too high actually. Some people must be rated 50-60 or even negative 100 in this pool.

It's about Low-Elo Purgatory of Chaos for millions of people. Where their skill is mostly ignored by the system. It's also about localisation. Please, read my posts, I explained everything several times.

Alexeivich94

Sep 13, 2024

0

#166

basketstorm wrote:

Alexeivich94 wrote:

basketstorm wrote:

100 Elo difference is ~14% shift in winning chances, 60 Elo - ~8.5% shift. Just because 60 or even 100 seems like a small number (compared to what, a million?), doesn't mean it is something that can be neglected when we measure player's performance.

Your simulation had players with actual strength ~1000 ending up at near starting point elo. There is no way the math or the parameters check out for that script you used.

You've misunderstood the math and with your comment you demonstrate a common misconception: assumption that Elo rating system "somehow" "magically" learns about some absolute value, some true player's strength. You'd expect all ratings to grow leaving starting Elo behind? That is not possible. And it's important to understand that Elo is relative. You'd have to keep extending the room on both sides to keep sorting all the players by their skill. This does not happen on chess.com.

And 1000 strength in the simulation was actually the starting strength. So it's normal and expected that the weakest (1000 strength) players received mostly low ratings. It's not normal however, that noticeably stronger players also remained in that low-Elo category. It's not a flaw of simulation because we see identical situation in real life here on chess.com: apparent strength of low-Elo players differs significantly.

The point is: in a fair system, strength difference in Elo must be reflected in the rating difference. Here it is simply not possible because of severe localisation of graphs that is inevitable in player base of such size. Low starting Elo and rating floor contribute to the problem.

"id player_name hidden_strength_Elo final_rating_Elo

266 Hikaru Gajdosko 1112 100

322 Magnus Capablanca 1003 132"

Dont know what youre talking about, but heres a quote from your results.

Im sure your formula included a win/loss/draw probability value for x hidden strength player against y hidden strength player. In which case the quoted results should have very low probability of occurring.

basketstorm

Sep 13, 2024

0

#167

Alexeivich94 wrote:

"id player_name hidden_strength_Elo final_rating_Elo

266 Hikaru Gajdosko 1112 100

322 Magnus Capablanca 1003 132"

Dont know what youre talking about, but heres a quote from your results.

Im sure your formula included a win/loss/draw probability value for x elo player against y elo player. In which case the quoted results should have very low probability of occurring.

Couldn't understand you, can you elaborate, please?

RandomChessPlayer62

Sep 13, 2024

0

#168

basketstorm wrote:

Alexeivich94 wrote:

"id player_name hidden_strength_Elo final_rating_Elo

266 Hikaru Gajdosko 1112 100

322 Magnus Capablanca 1003 132"

Dont know what youre talking about, but heres a quote from your results.

Im sure your formula included a win/loss/draw probability value for x elo player against y elo player. In which case the quoted results should have very low probability of occurring.

Couldn't understand you, can you elaborate, please?

I think they're saying that it shouldn't happen like that.

basketstorm

Sep 13, 2024

0

#169

Well then my answer would be: it should and it happens and I've explained how and why

MaetsNori

Sep 14, 2024

0

#170

The problem with this is all the players seem to have been given "hidden strengths" that are far above their online rating.

This assumes that all players are underrated. This doesn't reflect reality, as it leaves out a considerable portion of the chess population: players who are overrated.

It also seems to ignore players who are accurately rated, as well.

If we want a more accurate simulation, we need a distribution of players who are a mix of underrated, overrated, and accurately rated - just as it is online.

basketstorm

Sep 14, 2024

0

#171

MaetsNori wrote:

seem to have been given "hidden strengths" that are far above their online rating.

Strength must not match rating. Rating offset is arbitrary, only rating differences are important, they should reflect differences in strength. That doesn't work for low-Elos.

RandomChessPlayer62

Sep 14, 2024

0

#172

Who wants to contribute to "Game is rigged" Bingo?

MaetsNori

Sep 14, 2024

0

#173

basketstorm wrote:

Strength must not match rating. Rating offset is arbitrary, only rating differences are important, they should reflect differences in strength. That doesn't work for low-Elos.

I see your point.

Though, at lower levels, I'd say chess outcomes become less reliant on skill and more reliant on chance - as the lower players get in skill, the more random their decisions and move choices become, due to a lack of understanding.

This might account for the apparent "luck" variable in which ratings seem to fluctuate wildly at lower levels - as the reasoning that players use for deciding their moves is also fluctuating wildly.

MaetsNori

Sep 14, 2024

0

#174

We also need to consider the coding that was used to run the simulation, to see exactly just how "random" the output is designed to be.

Saying, "Look how wildly these ratings fluctuate" isn't terribly convincing, if the coding itself is intended to produce fluctuations in the first place ...

Alexeivich94

Sep 14, 2024

0

#175

MaetsNori wrote:

The problem with this is all the players seem to have been given "hidden strengths" that are far above their online rating.

This assumes that all players are underrated. This doesn't reflect reality, as it leaves out a considerable portion of the chess population: players who are overrated.

It also seems to ignore players who are accurately rated, as well.

If we want a more accurate simulation, we need a distribution of players who are a mix of underrated, overrated, and accurately rated - just as it is online.

Dont think that should be an issue as long as the pool includes all ranges of strength, if the formula is correct they should sort themselves out by just playing each other.

basketstorm

Sep 14, 2024

0

#176

MaetsNori wrote:

We also need to consider the coding that was used to run the simulation, to see exactly just how "random" the output is designed to be.

Saying, "Look how wildly these ratings fluctuate" isn't terribly convincing, if the coding itself is intended to produce fluctuations in the first place ...

You don't need simulations and codes to see that 8/200=4%

Now you blame that I did something wrong with the code, that's a weak argument. I provided enough information about flaws of the system, this requires no simulation, no code, just some common logic. But if you understand computer coding, you could do your own simulation. But I'm curious what are you trying to prove? That between 100 and 200 Elo with +/-8 increments (when Elos are same) we can fit wide range of skills? How? Magically?

basketstorm

Sep 14, 2024

0

#177

Alexeivich94 wrote:

MaetsNori wrote:

The problem with this is all the players seem to have been given "hidden strengths" that are far above their online rating.

This assumes that all players are underrated. This doesn't reflect reality, as it leaves out a considerable portion of the chess population: players who are overrated.

It also seems to ignore players who are accurately rated, as well.

If we want a more accurate simulation, we need a distribution of players who are a mix of underrated, overrated, and accurately rated - just as it is online.

Dont think that should be an issue as long as the pool includes all ranges of strength, if the formula is correct they should sort themselves out by just playing each other.

They don't play "each other". Localized pool of low-Elo players is always there, doesn't matter if the rest are rated higher, fairly/unfairly - there's no interference.

MaetsNori

Sep 14, 2024

0

#178

basketstorm wrote:

You don't need simulations and codes to see that 8/200=4%

Now you blame that I did something wrong with the code, that's a weak argument. I provided enough information about flaws of the system, this requires no simulation, no code, just some common logic. But if you understand computer coding, you could do your own simulation. But I'm curious what are you trying to prove? That between 100 and 200 Elo with +/-8 increments (when Elos are same) we can fit wide range of skills? How? Magically?

I'm not "blaming", "arguing", or trying to prove anything. I'm simply pointing out that considerations need to be made.

Thinking critically means considering things from as many reasonable angles as we can before drawing conclusions.

Snce you used ChatGPT for this, have you asked ChatGPT about the results? I'll feed your original post into it and ask for its feedback, out of curiosity.
--- This was what it had to say. Take note of the final sentence, which, interestingly enough, aligns with my previous post.
ChatGPT: "The experiment described raises some interesting points about the behavior of rating systems like Elo, but it's also important to consider several factors to assess its validity and conclusions. Here’s a breakdown of key aspects and considerations:

Rating Distribution and Simulation:

Initial Setup: The simulation starts with 1,000 players, where 90% have a hidden strength between 1,000 and 1,400 Elo, and 10% have a hidden strength between 1,400 and 2,800 Elo.
Simulation Mechanics: The pairing is based on Elo ratings with a small diffusion to emulate online presence, and the game results are dictated by the actual hidden strength of the players.
Expected Results:

High Elo Players: In a well-designed Elo system, players with higher hidden strength should naturally end up with higher ratings over many games. This part of the experiment seems to align with expected behavior.
Low Elo Players: Players with lower hidden strength might see more variability in their final rating due to the stochastic nature of games and the relative strength of opponents. In a large pool, random fluctuations can cause significant discrepancies, especially in systems that are not perfectly calibrated.

Key Observations:

Rating Floor and Bumping: The observation that players with a lower hidden strength might end up with a wide range of final ratings can be attributed to the randomness of the results in a simulated environment. If the rating system is designed with a floor (e.g., 100 Elo), then it prevents ratings from falling below a certain point, but it doesn’t necessarily prevent a large variance around that floor.

Impact of Luck: In a probabilistic system, some players might experience a "lucky streak" or "unlucky streak" that causes their rating to deviate from their hidden strength. This is a known phenomenon in rating systems, and it can be exacerbated in smaller pools or with fewer games played.
Analysis and Fairness:

Large Variability: Large differences between actual strength and final rating (e.g., a player with hidden strength 1,200 having a final rating of 355) might occur due to the nature of the rating system's adjustments and the randomness inherent in match outcomes.
System Calibration: Ideally, after a large number of games, ratings should converge to approximate a player’s true strength. However, the exact rate of convergence can depend on many factors, including the number of games played and the volatility of the rating adjustments.
Conclusion:

The observed discrepancies are not necessarily indicative of flaws in the Elo system but might reflect the natural variability and randomness in simulations with finite numbers of games and players. Real-world rating systems also experience such variances, though they might be less extreme due to different factors such as player base size and rating adjustments.
In summary, the experiment illustrates some inherent properties and limitations of rating systems like Elo, especially in terms of how ratings can fluctuate due to randomness and the number of games played. The wide variability observed for lower-rated players could be a result of the simulation parameters and randomness, rather than an inherent flaw in the Elo system itself."

Alexeivich94

Sep 14, 2024

0

#179

basketstorm wrote:

Alexeivich94 wrote:

MaetsNori wrote:

The problem with this is all the players seem to have been given "hidden strengths" that are far above their online rating.

This assumes that all players are underrated. This doesn't reflect reality, as it leaves out a considerable portion of the chess population: players who are overrated.

It also seems to ignore players who are accurately rated, as well.

If we want a more accurate simulation, we need a distribution of players who are a mix of underrated, overrated, and accurately rated - just as it is online.

Dont think that should be an issue as long as the pool includes all ranges of strength, if the formula is correct they should sort themselves out by just playing each other.

They don't play "each other". Localized pool of low-Elo players is always there, doesn't matter if the rest are rated higher, fairly/unfairly - there's no interference.

Was talking about a situation where player of different strength have a similar starting elo like in your simulation. Of course they play each other.

RandomChessPlayer62

Sep 14, 2024

0

#180

Low Elo is chaotic not because of the system but because of the players at low Elo, low Elo is a trap because players play randomly before eventually learning strategy, this means that low Elo players may randomly make an absolutely brilliant move that wins the game by accident, trapping low-skill players in the chaos until they become strong enough to withstand the occasional brilliant and exploit the frequent blunders. It has nothing to do with the rating system itself. In your simulation, the low 1000s replace the chaotic 100-300s, but then due to your simulation using probability to represent strategy they become chaotic, requiring extreme skill to escape the new low Elo zone, trapping the 1000s away from their real rating. It's not a product of the system but a quality of the players whose real rating is low Elo causing chaos that can't be easily escaped by low Elo players.