The data is not coming from the language model itself. It's a result of actual program execution. So it is indeed a "custom written program" in this case.
And I think, you overestimate the complexity of this task.
Can you share the code?
The data is not coming from the language model itself. It's a result of actual program execution. So it is indeed a "custom written program" in this case.
And I think, you overestimate the complexity of this task.
Can you share the code?
For an accurate simulation, we don't just have to account for skill, we have to account for the changing of skill, good days, bad days, and other multipliers such as feelings and focus changing on winning/losing streaks. People aren't just numbers, and I've mentioned rating deflation but I haven't explained it's main cause and why it isn't existent in reality. In your simulation there are a finite number of players with 200 elo that then play against eachother with their ratings changing over time. If there wasn't deflation then your graphs would just look ridiculous. There's rating deflation because even if every player's skill was your set minimum, 1000, we'd need 1800000 more Elo (if I got my math right) in the system, as you have mentioned there is some extra added elo at the floor but it's too small to fix 1800000 Elo, what is required is lots of new players added every tick. Also, as my previous messages have called out, you assume complete beginners don't exist for your hidden skill possibilities, which would probably affect the simulation. Beginners, real people, and new players frequently joining all happen in real chess while they do not in your simulation
Think about this: if two players, 200-rated initially, with 100 as rating floor play 1000 games against each other, with K=16 like here on chess.com, is there any chance their final rating will be similar if they have equal strength and win/rate is 50% with +/-0.1% error? You would think: "of course, maybe small deviation, but yeah, ratings should be pretty close". Wrong. 170 vs 230, 214 vs 182, 151 vs 248 - these are possible final ratings even after 1000 games. This is like +/-15% prediction error. 35, 50, 65% - who knows. So inaccurate, even for equally strong players.
With 1500 as starting rating, however, prediction inaccuracy is much smaller in this simulation (~1%).
But the issue is not only about starting ratings, localized graphs create distortions in the global fabric of Elo ratings.
For an accurate simulation, we don't just have to account for skill, we have to account for the changing of skill, good days, bad days, and other multipliers such as feelings and focus changing on winning/losing streaks.
Additional variables don't solve described fundamental flaws of the system.
200 Elo as start, 100 Elo as floor, +/-8 Elo as rating change when playing roughly equally rated opponent. Deal with it. Sort people by skill with such input. You can't. Only those who are significantly stronger will move into more or less accurate tier. The rest would be squeezed in thigh chaotic range for years. No need to use any simulations to understand this. And yes, I tried introducing more 200-Elo players later, when the pool is formed. Doesn't change anything, because there are still low-Elo players with significant randomness in the skill.
Think about this: if two players, 200-rated initially, with 100 as rating floor play 1000 games against each other, with K=16 like here on chess.com, is there any chance their final rating will be similar if they have equal strength and win/rate is 50% with +/-0.1% error? You would think: "of course, maybe small deviation, but yeah, ratings should be pretty close". Wrong. 170 vs 230, 214 vs 182, 151 vs 248 - these are possible final ratings even after 1000 games. This is like +/-15% prediction error. 35, 50, 65% - who knows. So inaccurate, even for equally strong players.
With 1500 as starting rating, however, prediction inaccuracy is much smaller in this simulation (~1%).
But the issue is not only about starting ratings, localized graphs create distortions in the global fabric of Elo ratings.
what do we count as close ratings? I say 60 Elo difference is close, even 100 Elo difference.
100 Elo difference is ~14% shift in winning chances, 60 Elo - ~8.5% shift. Just because 60 or even 100 seems like a small number (compared to what, a million?), doesn't mean it is something that can be neglected when we measure player's performance.
100 Elo difference is ~14% shift in winning chances, 60 Elo - ~8.5% shift. Just because 60 or even 100 seems like a small number (compared to what, a million?), doesn't mean it is something that can be neglected when we measure player's performance.
Your simulation had players with actual strength ~1000 ending up at near starting point elo. There is no way the math or the parameters check out for that script you used.
100 Elo difference is ~14% shift in winning chances, 60 Elo - ~8.5% shift. Just because 60 or even 100 seems like a small number (compared to what, a million?), doesn't mean it is something that can be neglected when we measure player's performance.
Your simulation had players with actual strength ~1000 ending up at near starting point elo. There is no way the math or the parameters check out for that script you used.
You've misunderstood the math and with your comment you demonstrate a common misconception: assumption that Elo rating system "somehow" "magically" learns about some absolute value, some true player's strength. You'd expect all ratings to grow leaving starting Elo behind? That is not possible. And it's important to understand that Elo is relative. You'd have to keep extending the room on both sides to keep sorting all the players by their skill. This does not happen on chess.com.
And 1000 strength in the simulation was actually the starting strength. So it's normal and expected that the weakest (1000 strength) players received mostly low ratings. It's not normal however, that noticeably stronger players also remained in that low-Elo category. It's not a flaw of simulation because we see identical situation in real life here on chess.com: apparent strength of low-Elo players differs significantly.
The point is: in a fair system, strength difference in Elo must be reflected in the rating difference. Here it is simply not possible because of severe localisation of graphs that is inevitable in player base of such size. Low starting Elo and rating floor contribute to the problem.
Strength is a very badly used term. Chess strength varies. For example, I am around 1500 blitz, but 2000+ rapid. You start low because low elo = less skill. If you started high, then you would get pushed down by higher rated players. Ratings are low and high because of relativity. Also, FIDE does NOT use rating floors. It is reflected. 1800's will beat 1400's most of the time, 400's will lose to 2000's most of the time... One key thing: There are always upsets. People can change based on their environment and energy. Just because you keep losing to low-rated players, doesn't mean the system is oppressive and actively trying to bring you down. We ALL use the same system. Math doesn't discriminate.
The definition of oppressive: unjustly inflicting hardship and constraint, especially on a minority or other subordinate group
It's not oppressive.
Hello everyone. I was doing a bit of reading in this article, and I find that I have some thoughts about the way people seem to play the time controls. I have noticed that people playing chess on this site, play long time controls; aka slow chess, they play making their moves like they are playing a speed chess style of game (1 minute, 3minute, 5 minute, and so on), but play speed chess, like they are playing a slow chess style of game ( 30 minute, 60 minute and so on). I am definitely no statisticion expert or anything, nor am I an IM or GM either, but I do love playing this game called chess and the way I see it, while most others are playing the game for their own specific realization of their chess goals, but if anybody who plays this game who doesn't make every attempt to play for the checkmate wins in every one of their games, they are cheating themselves while trying to cheat others in the process. Unfairly at that. I find though also that by playing bullet games, I leave myself with no excuse for not playing my games for the checkmate wins as opposed to the lesser ways of winning games. Who likes to actually play ftheir bullet games for the time win, just to lose on time? Is it not better to win games by resignation than by your opponent flagging? I believe that the checkmate wins are more valuable; and is it or is it not the objective of the game of chess to win by checkmate? Why doesn't anybody talk about how since this is an online chess site that nobody knows who is actually playing fairly according to chess.com's fair play policy and who is not? Especially compared to actual OTB chess games where it is truly easier to be able to oversee the behavior of the players involved, because it is face to face and in person. I figure if people on this site want to play this game online, we all should remember that online chess play is supposed to be played the same way as in OTB play. With chess being played online more now than ever before, it is much easier for people to be able to implement computer software(chess engines). This is nothing new now, but imagine if online chess was around 40 years ago or more like it is in today's age..... at least I know that if chess.com and other chess related organizations happened to exist back then like it is today, perhaps, I could have been the next United States American world chess champion after Bobby Fischer. Only God knows really, but I am not trying to overcomplicate my understanding of chess, or diminish others' understanding either, but to hopefully provide some feedback and my own personal comments. I would like to personally say though that whether anyone agrees or even disagrees in whole or in part, I respect that, and also personally, if people want to play their games in a slow thinking/moving manner of pieces, then they should play a long, slow time control game. If they want to play games in a speedy thinking/moving manner of pieces, then there is the option of play speed chess. In my own experience the only thing that really makes sense to me is that if people play bullet chess for example, then there is no excuse for anybody to play slowly; let alone not take the opportunity to win their bullet games by checkmate.
Strength is a very badly used term. Chess strength varies. For example, I am around 1500 blitz, but 2000+ rapid. You start low because low elo = less skill. If you started high, then you would get pushed down by higher rated players. Ratings are low and high because of relativity. Also, FIDE does NOT use rating floors. It is reflected. 1800's will beat 1400's most of the time, 400's will lose to 2000's most of the time... One key thing: There are always upsets. People can change based on their environment and energy. Just because you keep losing to low-rated players, doesn't mean the system is oppressive and actively trying to bring you down. We ALL use the same system. Math doesn't discriminate.
Thanks for your comment, dear friend. But your comment reflects some common misconceptions about ratings, skill, and the way Elo systems work. Let’s break down the claims and address where they go wrong:
Strength in chess refers to a player's ability to win games against others in a given format or time control. Just because someone’s rapid rating is significantly higher than their blitz rating doesn’t mean the term "strength" is misused—it simply means they are stronger in one format than another.
The starting rating is arbitrary and is set by the system (for example, FIDE's starting rating for new players is typically around 1000-1200). A player’s rating only becomes a reflection of their skill after they’ve played a sufficient number of games and the system has adjusted to their true level. Starting everyone at a low Elo allows the system to avoid overestimating new players' strength, but it’s not a fundamental rule of Elo systems that “low Elo = less skill.”
A player’s starting rating could just as easily be set at a higher value, and they would still converge to their correct rating through wins or losses. The notion that starting low is necessary because high-rated players would push you down is true in practice but only because the system needs time to adjust to actual performance, not because starting low is inherently tied to less skill.
Ratings can reflect skill more accurately in tightly connected systems where players regularly face opponents from a wide range of skill levels. In larger, fragmented systems, ratings may become inflated or deflated due to limited crossover between skill brackets.
In effect, this does impose a floor on rating visibility, even if it’s not a technical rating floor. Additionally, once players are over 1000, there’s no minimum threshold below which they can’t drop. However, FIDE’s starting rating and lack of a formal floor make it a more gradual system, unlike others that might cap how low a rating can go.
In large pools with limited crossover, ratings might not reflect true skill accurately, meaning a 1800-rated player from one isolated pool may not be as strong as a 1800-rated player from another pool. The same applies to 1400-rated players. The math behind Elo systems doesn’t account for these external distortions, and it can create a sense of unpredictability when players from different pools meet.
Additionally, upsets and variance are a key part of Elo, as noted, but this doesn’t mean the system is immune to external problems like insufficient game data, fragmented player bases, or mismatched pairings.
For example:
So while the mathematical calculations of Elo are impartial, the system itself can be skewed by external factors.
The comment oversimplifies the way Elo and ratings work in complex, large-scale environments. While Elo is a useful tool for predicting win probabilities, it has limitations when applied to large, fragmented pools, and ratings can become distorted if the system isn't properly managed. Key points like the importance of player crossover, the accuracy of win probabilities, and environmental factors like player base size and distribution are missing from the comment. Moreover, while math is impartial, the application of the system can introduce biases and distortions that affect the fairness of the outcomes.
They don't have pools based on region. I've played people from Libya, Saudi Arabia, South Africa, India, America, Canada
The vast majority of players do not have localized pools. In fact, localized pools produce higher ratings
i told ai to make the above comment shorter if anyone wants to read the short version
The author observes that players on the chess site often mismatch their time controls, playing slow games with a fast-paced mindset and vice versa. They emphasize the importance of striving for checkmate wins rather than relying on time victories, arguing that true success in chess should focus on achieving checkmate. The author also highlights concerns about fair play in online chess compared to over-the-board games, suggesting that online play should mirror traditional standards. Ultimately, they advocate for players to choose their time controls based on their preferred pace of play.
Yes that could be a problem, easy to track if you count losses by timeout vs wins by opponent's timeout. That could tell if you're faster than your opponents.
Looking at my bullet stats, you could tell I play with hyper-bullet mindset while my opponents more tend to Blitz mindset:
They don't have pools based on region. I've played people from Libya, Saudi Arabia, South Africa, India, America, Canada
Easy: timezone difference, people need to sleep, people are more productive when they're awake not sleepy etc, that's how you get into different pools here on chess.com. Look up stats against different countries btw, if you played a lot you'll see certain patterns like one country might appear stronger in general.
Also, maybe people are stuck at low elo in simulation because there isn't enough elo in the system?
They are stuck because chess is one of the most played games in the world and it's difficult to improve in because it's so competitive
Your message is simply dismissive, it ignores the problem, that's not a way to deal with this issue (to dismiss, to laugh, to compare it to conspiracy etc). Read the conclusion in my previous post, at least the conclusion, it's easy to understand and it has very strong points.