Checking if Elo system is oppressive [With proofs] - Chess Forums - Page 4

IndianCamels

Sep 12, 2024

0

#61

That's the thing. You don't lose 15 points to a 2000 as an 800. Why not you try challenging a 2000. You'll lose zero rating. You're not using the elo OR glick system.

RandomChessPlayer62

Sep 12, 2024

0

#62

If a 1200 is down with 100s and beating them, they should be moving up, if they're losing to 1300s, then the 1300s should be moving up, if the 1300s are losing to 1400s then the 1400s should be moving up, and so on. Common sense and logic outweighs a semi-coherent nonsense machine (ChatGPT)!

ClipSpent

Sep 12, 2024

0

#63

The reason most players are at low elo is most likely because of the fact that they play casually online, unlike something like an OTB tournament.

RandomChessPlayer62

Sep 12, 2024

0

#64

basketstorm wrote:

Using ChatGPT powers I simulated 1000000 chess games in a pool of 1000 players. Pairing was rating based with small diffusion to emulate online presence factor. Win/loss factor - just like prescribed by Elo. All players had hidden strength in Elo: 90% of players - from 1000 to 1400, 10% players - from 1400 to 2800. Initial rating was 200, rating floor - 100.

Where are we getting this "Hidden strength" from? What source do you have here? If you're using accuracy or stockfish estimated elo, I could blunder every second move but because my opponent blundered EVERY move it might say I was brilliant. Stockfish will say that I am a 1400 when my accuracy says I'm 300, and my current elo is 500. Actually, no, it isn't even elo it's Glicko, every time I say elo I mean Glicko because chess.com uses Glicko.

RandomChessPlayer62

Sep 12, 2024

0

#65

Comparing humans to stockfish isn't reliable, I could possibly beat stockfish not because I'm actually 9999 elo (I'm not) but because it assumes its opponent will play at its level and that assumption is a weakness I accidentally exploit occasionally. In everything else I am far worse than stockfish

basketstorm

Sep 12, 2024

0

#66

IndianCamels wrote:

That's the thing. You don't lose 15 points to a 2000 as an 800. Why not you try challenging a 2000. You'll lose zero rating. You're not using the elo OR glick system.

I never said 2000 loses 15 points after losing against 800 or something like that. Quoting myself:
If 200 beats you as 800 (because he is underrated), it's not fair to lose 15 points.

That means you are 800, opponent is 200, but he is stronger, just underrated.

RandomChessPlayer62

Sep 12, 2024

0

#67

basketstorm wrote:

IndianCamels wrote:

That's the thing. You don't lose 15 points to a 2000 as an 800. Why not you try challenging a 2000. You'll lose zero rating. You're not using the elo OR glick system.

I never said 2000 loses 15 points after losing against 800 or something like that. Quoting myself:
If 200 beats you as 800 (because he is underrated), it's not fair to lose 15 points.

That means you are 800, opponent is 200, but he is stronger, just underrated.

800s aren't matched with 200s

basketstorm

Sep 12, 2024

0

#68

RandomChessPlayer62 wrote:

basketstorm wrote:

Using ChatGPT powers I simulated 1000000 chess games in a pool of 1000 players. Pairing was rating based with small diffusion to emulate online presence factor. Win/loss factor - just like prescribed by Elo. All players had hidden strength in Elo: 90% of players - from 1000 to 1400, 10% players - from 1400 to 2800. Initial rating was 200, rating floor - 100.

Where are we getting this "Hidden strength" from? What source do you have here? If you're using accuracy or stockfish estimated elo, I could blunder every second move but because my opponent blundered EVERY move it might say I was brilliant. Stockfish will say that I am a 1400 when my accuracy says I'm 300, and my current elo is 500. Actually, no, it isn't even elo it's Glicko, every time I say elo I mean Glicko because chess.com uses Glicko.

Here on chess.com it is Elo. Glicko uses Elo. Glicko has some specific rules about updating Elos of players, but Elo is still Elo in Glicko.

Hidden strength is just assumed Elo of each player, where it starts doesn't matter what matters is a broad range of skills is squeezed into 100-200 range as result of such low starting rating and rating floor (100). That hidden strength is a fixed strength of players in the simulation that affects simulated games outcome, not some evaluation of the chess UI that compares your moves to Engine's moves to calculate some scores. Like in real life we have stronger players and we have weaker players and if they are rated perfectly relatively to each other, their Elos would represent probability of winning against each other. In the simulation hidden strength is that perfect true measure of strength for each player, because even unrated player has some certain strength. Rating is separate. Simulation starts for each player with rating = 200. But all players have different actual strength. Just like it happens in real life.

RandomChessPlayer62

Sep 12, 2024

0

#69

basketstorm wrote:

RandomChessPlayer62 wrote:

basketstorm wrote:

Using ChatGPT powers I simulated 1000000 chess games in a pool of 1000 players. Pairing was rating based with small diffusion to emulate online presence factor. Win/loss factor - just like prescribed by Elo. All players had hidden strength in Elo: 90% of players - from 1000 to 1400, 10% players - from 1400 to 2800. Initial rating was 200, rating floor - 100.

Where are we getting this "Hidden strength" from? What source do you have here? If you're using accuracy or stockfish estimated elo, I could blunder every second move but because my opponent blundered EVERY move it might say I was brilliant. Stockfish will say that I am a 1400 when my accuracy says I'm 300, and my current elo is 500. Actually, no, it isn't even elo it's Glicko, every time I say elo I mean Glicko because chess.com uses Glicko.

Here on chess.com it is Elo. Glicko uses Elo. Glicko has some specific rules about updating Elos of players, but Elo is still Elo in Glicko.

Hidden strength is just assumed Elo of each player, where it starts doesn't matter what matters is a broad range of skills is squeezed into 100-200 range as result of such low starting rating and rating floor (100). That hidden strength is a fixed strength of players in the simulation that affects simulated games outcome, not some evaluation of the chess UI that compares your moves to Engine's moves to calculate some scores. Like in real life we have stronger players and we have weaker players and if they are rated perfectly relatively to each other, their Elos would represent probability of winning against each other. In the simulation hidden strength is that perfect true measure of strength for each player, because even unrated player has some certain strength. Rating is separate. Simulation starts for each player with rating = 200. But all players have different actual strength. Just like it happens in real life.

Ah, so you just randomly choose hidden strength for every player.

RandomChessPlayer62

Sep 12, 2024

0

#70

also, wouldn't a player's real skill change?

basketstorm

Sep 12, 2024

0

#71

RandomChessPlayer62 wrote:

800s aren't matched with 200s

Happens often in tournaments.

basketstorm

Sep 12, 2024

0

#72

RandomChessPlayer62 wrote:

also, wouldn't a player's real skill change?

Sure, skill could grow or even drop. But that wouldn't "fix" the problem.

basketstorm

Sep 12, 2024

0

#73

RandomChessPlayer62 wrote:

Ah, so you just randomly choose hidden strength for every player.

Yep, not just randomly, I added more weaker players. Tried log-normal distribution too and even without any conditions, just equal probability of very strong and very weak players - doesn't change the outcome too much, result of simulation always shows that some "sorting" happens that squeezes a lot of players in the lowest tier.

RandomChessPlayer62

Sep 12, 2024

0

#74

I notice that the hidden strength graph assumes that all players are past 1000 elo in actual strength, which is inaccurate,

and the graph only shows a specific point in time in your simulation, which may lead to inaccurate graphs as later players may be more evenly spread. Also, the massive rating deflation because your Elo graphs don't match up properly makes it look bad but the players are actually more equally spread out in the actual Elo than in hidden Elo, if not for a mistake in graph-making causing massive rating deflation then many would have more Elo then they should. Your graphs are nonsense.

basketstorm

Sep 12, 2024

0

#75

RandomChessPlayer62 wrote:

I notice that the hidden strength graph assumes that all players are past 1000 elo in actual strength, which is inaccurate,

and the graph only shows a specific point in time in your simulation, which may lead to inaccurate graphs as later players may be more evenly spread. Also, the massive rating deflation because your Elo graphs don't match up properly makes it look bad but the players are actually more equally spread out in the actual Elo than in hidden Elo, if not for a mistake in graph-making causing massive rating deflation then many would have more Elo then they should. Your graphs are nonsense.

Elo is relative. There's no absolute defined Elo. Elo difference between players is accurate in simulation. No difference, I could've started their hidden strength from 100 Elo or 1 Elo, result is same, you can't say it's inaccurate, because it doesn't affect the rating anyway, it's just the hidden strength that affects outcome of games. You have clearly misunderstood my concept and data, please examine carefully before using such loud words as "nonsense".

In short: resulting Elo does not reflect actual strength difference anymore. It acts more like sorting and low-Elo players (broad range of skill actually) is squeezed and distorted that's why the left side of the green graph is peaking.

basketstorm

Sep 12, 2024

0

#76

llama_l wrote:

ChatGPT can barely add 5+5, it's can't simulate this, lol

Seems like it has been some time since your neural knowledge base about ChatGPT was last updated.

Alexeivich94

Sep 12, 2024

0

#77

Nah.. Your simulation had players w actual strength around 1000 end up at elo 100 after a 1000 games? Somethings badly wrong with the script. Without doing the math I know the probability for this is non existant considering the rating system. How big a sample size of players anyway?

Edit. Nvm it was 1000 players and 100 000 games. Either way I dont find this plausible.

RandomChessPlayer62

Sep 12, 2024

0

#78

"Game is rigged" Forum Posters when your argument contains a single flaw that doesn't actually exist but your argument is too strong for them to ignore: 🤓👆

"Game is rigged" Forum Posters when your argument contains strong, irrefutable evidence against them: 🫥

magipi

Sep 12, 2024

0

#79

basketstorm wrote:

xtreme2020 wrote:

There’s no way chatGPT can simulate something like this, it can’t even answer the simplest problems I’ve ever seen

ChatGPT is evolving. Free version can't do much, sure, but most advanced runs analysis, you need to wait for a while and it returns with graphs and tables

What's the chance that those graphs and tables are just completely made up random nonsense?

ChatGPT is a language model, not a statistical tool.

basketstorm

Sep 12, 2024

0

#80

llama_l wrote:

this keeps happening until the assumption becomes true (their skill and rating are the same).

This is sort of the whole point of the rating system and why/how it works.

Maybe that was the intention behind the system, but it doesn't happen. Not in simulation, not in real life.

llama_l wrote:

One issue is if you assume everyone's rating is established (and everyone has the same low RD if you're using Glicko) then it's a zero-sum situation i.e. the number of points the loser loses equals the number of points the winner wins... i.e. the total number of rating points stays the same no matter how many games are played.

When this happens, in order for the underrated players to gain rating, they have to take away from higher rated players, in other words rating deflation. The cost of this deflation is shared equally among all players (you just have to run the simulation long enough). In the end you'll get a distribution that correctly represents the player's skills, but it will be shifted down (everyone a few points lower than they "should" be for example).

RD becomes low too fast here. So it plays no big role.
Total number of rating points doesn't stay the same. Real life example:

https://www.chess.com/analysis/game/live/119470760650

One player was 104, lost only -4 points not 8, because rating floor is 100. Other player was 118, he still gained +8. When you open the link, you will see the rating after the game is finished, it is 100, 126.

If you claim you did a simulation, share results, please. Tables, graphs etc.