Why ELO and CAPs fail : My solution

Why ELO and CAPs fail : My solution

Avatar of Zipho_Lunika
| 3

The problem with ELO

Elo is popularly treated as a kind of absolute measure by most people. Even Magnus Carlsen believes that he may reach 2900, as if it is an absolute milestone. There is a kind of psychological tendency that is indicative of someone's true skill level.

Elo is actually a relative measure. It depends on the current pool of players that you are playing. It is not the same everywhere. It may be the case that a player rated 1800 FIDE may actually be stronger than the 2200 in some jurisdictions and rating pools.

I once heard, anecdotally, a player from India, who is rated 1400, claim that the 1400s in Australia were easier to beat! In fact, he claimed his rating climbed dramatically when he immigrated to Australia, reaching 1900!

Of course, we need not delve into the inherent flaw of comparing ELOs of past players and present players. Such-and-such a player has a higher elo than Anatoly Karpov, and therefore they are better than Karpov, etc., neglecting to factor in developments in chess theory, analysis, preparation, engine use, and other things.

I once posed a thought experiment of how long it would take an 1800-rated player to reach 2200 if his pool of opponents were 1300-1600, without losing a single game. The answer is that it is a lot! Depending on a few assumptions, like the k factor. If we assume he plays players rated 1450 on the dot, then he may need over 80 straight wins for a k factor of 40 and over 150 wins for a k factor of 20.

Over 100 straight wins with classical time controls!

Now let us assume that each player is assigned 90 minutes for the full game. That means a maximum of 3 hours per game. That implies he would have to play over 300 hours maximum.

But would the final rating achieved be a true measure of his strength? Of course not! If he played weaker players, then his strength is not truly 2200 as we would like to think of it.

So here we see a flaw with the ELO system. It does not really measure someone's strength. It is relative to the playing pool you find yourself in.

Additionally, the ELO system is subject to rating manipulation and collusion. Sandbagging is already a problem with online chess. Prearrangement and other shady tactics also make the ELO system vulnerable.

Problem with CAPS score:

There is also an issue with the chess.com accuracy system. There are countless players rated 1300 whose average accuracy is much greater than my own. I am at least 2200 on chess.com blitz. My average accuracy was about 77% or something when I was rated 2400 blitz. Some 1300 players can average at over 80% accuracy. So how is this possible? Also, according to my insights, I perform worse than players of my strength at all stages of the game, on average. So the CAPS score seems like a dubious metric.

My solution

So how do we normalize or standardize the strength of a chess player? What metrics are more reliable? I propose that we pair players with computers. Have the player play chosen bots of different strengths and then use the result of those matches to gauge their true strength.

Given enough games and enough bots, it will be statistically easier to tell how strong a player really is!

So if you come from a country with a lower rating pool or a weaker chess culture, you don't have to fret because you are unable to meet FIDE's unrealistic standards of 2200 elo, 2300 elo, 2400 elo, etc. You are simply put through this measurement system, and your true strength is then determined! You will not be held back by your countrymen, your local players, etc.

Perhaps my solution may not be the best, but it is directed towards a more standardized, absolute measure that will measure any chess player's true statistical skill level that is not contingent on his local pool.