Photo by 64 magazine, 1991

Lev Polugaevsky. "How to Put the Elo Genie Back Into the Bottle". 1991

Spektrowski

Sep 2, 2021, 5:59 AM | 19

This article was written by Polugaevsky almost exactly 30 years ago for the 64 magazine (No. 18, September 1991). While some problems he described were solved since then, others are still being actively discussed.

How to Put the Elo Genie Back Into the Bottle

In the last quarter of a century, chess has broadly expanded its geographical borders and its fan base. Lots of various competitions featuring huge masses of players made it very difficult to calculate their relative strength.

Thus, professor Elo created his coefficient system for the purpose of preventing the chaos that threatened the chess world and grading the strong players.

We can say for sure that the introduction of Elo rankings did play its cementing role: it unified the army of active players, placing them on their deserved "rungs". Everyone got their personal strength badge, of sorts.

However, time passed, and now it's clear that while this system has a lot of merits, it has quite a few disadvantages as well.

First of all, we saw a serious defect in the very method of rating calculation: the results of all tournaments played in half a year are calculated summarily, with the last published Elo rating as the base value. It doesn't matter if you play one successful tournament after another - you still continue playing with your old rating for the entire half a year (which is obviously lower than your actual current strength). Many chess players have abused that system to chase after big ratings. As a result, we got unchecked, wild rating "pumping" which led to a very dangerous inflation.

I remember one comical occasion. Once, I met a friend who knew a lot about chess and kept up with all current events.

"Do you know Kasparov's current rating?" he asked me, then exclaimed, "2800!" I made a surprised face, even though I'd already heard the news about the world champion's fantastic record. And then I frowned a bit and answered, "I'll get a bigger one!" "You're joking, of course?" my friend asked. "No, I'm very serious", I said and then explained the program of my actions:

"First of all, I go play some tournaments with my 2600 rating and deliberately play bad, to lower it to about 2400. Then, I take a deep breath, plan the beginning date of my spurt, and in the next half a year, I sign up for 15 9-round Swiss tournaments with the average rating of about 2400. Of course, since my true rating is 2600, I easily score 8 or 8.5/9 points in every tournament, gaining 30 to 40 rating points for each. As a result, after the calculations, I surpass the 2800 mark and become... the world champion!"

"Really?!" my friend whispered, looking at me with wide eyes and, of course, understanding that I was joking after all. But is this utopia really so far from the truth? We've already seen some grandmasters and masters making great leaps in rating, 100 to 150 points. There's also an officially recognized record: 240 points! A chess player went from 2300 to 2540 in just a few months! At this rate, we'll surely reach the Guinness Book level at some point...

Even in the upper echelon, we see obvious inflation. Just 10 years ago, you could count the 2600-rated players using only your fingers [Polugaevsky was almost right: in July 1981 rating list, only 11 players surpassed 2600 - Karpov, Korchnoi, Huebner, Timman, Kasparov, Spassky, Portisch, Beliavsky, Larsen, Polugaevsky and Andersson.] A 14th category tournament was once a dream for any strong grandmaster, but now, the 16th category tournaments aren't a rarity, the sponsors are considering 17th or 18th category already, and sooner or later, we'll get 20th category...

The rating inflation essentially upset the apple cart, creating a rather murky picture when trying to determine the chess player's true strength. The top ten sometimes features people who never took part in a Candidates' event or in the World Cup final. I can show dozens of examples of very strong grandmasters who had a couple of bad tournaments and essentially negated all their previous achievements, letting a whole group of players who managed to "pump" a certain amount of points overtake them, even though they still obviously remained stronger. In other words, rating, in many cases, ceased to be a correct measure of a player's strength.

And to that, we must add the numerous technical mistakes made during the calculations; lately, they became more widespread because of sheer number of the tournaments.

Some may argue, "What's the tragedy here, should we talk about this so much?" The thing is, we indeed cannot keep silent anymore if we want to avoid the "Elo catastrophe" of our art. Rating is now playing a decisive role in our fate: it determines the personal invitations for the World Cup final cycle tournaments, the line-ups of international tournaments, and so on, and so forth. It's not a joke: sometimes even the measly five rating points cost a player dearly: one of the two almost equally-strong grandmasters directly qualifies for the World Cup final tournaments, while another has no rights and is forced to go through the gruelling qualification.

Back to the topic of inflation, you can't help but ask: how can we escape this dead end? First of all, coefficients should be calculated step by step, tournament by tournament, not by summing up performance in half a year. This approach is long used by the Moscow mathematician Eduard Dubov who calculates internal ratings in the USSR.

Back in 1986 at the FIDE Congress in Dubai, then in 1987 in Sevilla and 1988 in Saloniki, I'd discussed this numerous times, both in private talks and officially. But my efforts were futile. Some qualifying committee members and even some grandmasters who oversee the FIDE qualifying process said again and again, "Why make our already complicated life even further, it won't have any effect." They pointed out that calculating coefficients would be technically very difficult if they did it after each tournament.

The reality, however, showed how deluded they were. The "technical difficulty" excuse looked rather naive in the age of computer revolution. All in all, they wasted enough time, let the Elo "genie" out of its bottle, and now it's hard to correct the situation. Nevertheless, we should take quick action now.

And it's only half the problem... We shouldn't forget that by adopting the Elo system, we allowed mathematics to interfere with the creative process, which is not completely safe. Still, we did understand from the outset that even the most perfect mathematical formula couldn't be an ideal reflection of our chess existence.

Can, for instance, this formula account for the true changes in a young player's strength in a short amount of time? Let's consider this situation: a talented player who has a small rating trains diligently for several months, makes rapid progress and then plays in the same tournament as you. He's essentially a different player now, but the Elo system cannot take this into account, it only operates with known numbers.

A few years ago, the USSR championships had a relatively low average rating, because a number of very strong masters who played in them had no way to increase their rating in time - even though the real strength of those competition was equal to the highest-caliber tournaments. And it wasn't an accident that many famous grandmasters declined to play in the championship - they feared to lose their rating.

I think that one of the major downsides of the Elo system is that it values tournament victories or high-place finishes in and of themselves less than their point content. We still remember the times when a tournament winner could suddenly lose their rating. Thankfully, this shortcoming was corrected in time, but they forgot about matches. And so, we saw the winner of a World Championship match lose his rating... Isn't that absurd?

Let's look at another example. A grandmaster wins a very strong tournament with a modest result. By all measures, it's a huge success, but the Elo system might not value it as much as the same grandmaster's performance in another tournament, in which he's much stronger than his opponents and tries to score as many points as possible. This "slaughter of the innocents" can net him much more rating points. Is that fair?

Or, say, let's look at a world championship contender who won three Candidates' matches with a minimal score, not "profiting" all that much from them. And then he suddenly performs poorly at some tournament and loses so much rating that he drops down in the rankings. It's easy to imagine him actually playing his world championship match while he's, say, number 10 in the Elo list. Will we be happy with such an arithmetic "truth"?

Yes, we chess players made a huge mistake when we didn't restrain the pure mathematics from the outset. I think that the future reform of the Elo system should be geared towards considering the actual place taken in a tournament and more in-depth and flexible analysis of the tournament's value. The priority, of course, should be given to the World Championship and World Cup cycles, big international tournaments, etc.

Let's turn to tennis for comparison. There, it doesn't matter if one player defeated the other 3-0 or 3-2, or whether they won a set 6-0 or 6-3. Only the end result matters.

Chess, of course, is not identical to tennis - one chess game can be longer than an intense tennis match, and, of course, we shouldn't fully discount such a work in the Elo calculations. But still, the scored points should be a secondary factor to the place taken in a tournament, and there should be some kind of correction coefficient for that. Of course, to refine the rating system, we'll need to solve many complex problem. For instance, if we implement this new place-based method, what should we do with the Swiss tournaments, what scale to use?

This and many other questions should be answered by both chess experts and mathematicians. But we shouldn't immediately put them together at one table: there wouldn't be many chances to reach an agreement, because both sides will try to pull the rope towards themselves; the chess players will appeal to emotions, and the mathematicians to numbers. Let the GMA create a competent group of grandmasters who come up with a number of their own proposals and recommendations and give them to the mathematicians, for them to determine if these proposals are viable or come up with their own. Only after that, the GMA should adopt the final project (probably some kind of compromise between two viewpoints) of the "rating law".

Lev Polugaevsky. "How to Put the Elo Genie Back Into the Bottle". 1991

Spektrowski's Blog