# Skill and luck in high-level chess competitions

Many chess fans and experts seem to believe that an alternative design of the Candidates Matches would better achieve the goal of selecting the “best” challenger for a match against the current World Champion. David H. Krantz, Professor of Psychology and Statistics at Columbia University in New York, sent us an essay with "some probabilistic analyses of the results that could be expected from various possible designs for Candidates’ competition".

Much dissatisfaction has been voiced concerning the design of the Candidates’ matches recently concluded in Kazan (May 2011). Many seem to believe that some alternative design would better achieve the goal of selecting the “best” challenger for a match against the current champion. In this article I first present some probabilistic analyses of the results that could be expected from various possible designs for Candidates’ competition. These analyses suggest that it is impossible to find a practical design that has a high probability of identifying the “best” challenger. The penultimate section of the article questions the very concept of best challenger.

Free throws are simpler than chess for three reasons. First, the concept of “better” is pretty clear – one just means the player with the higher “true” accuracy percentage in free throws. If Player A makes 88.1% and Player B only 87.7% (these are very high, even among elite basketball players), then one declares that A is better than B. Second, there need be no interaction between the two. Player B does not guard A or defend against free throws. In particular, the comparison is inherently transitive: if A is better than B, and B better than C, it follows that A is better than C. This need not always be true in chess. Third, there is nothing comparable to the White advantage that usually prevails in high-level chess. To pass from basketball to chess, one must usually consider the 2-game mini-match, with alternate assignment of White and Black, as the competitive unit. Most designs under consideration involve double round-robin tournaments or matches with an even number of games, alternating colors. The simplifications for free-throw competition allow probabilistic issues to stand out clearly.

The obvious contest design is to have Players A and B each attempt N free throws, under some standardized conditions: the one with the higher percentage is declared the winner.

As a concrete example, suppose that the true percentage for A is 87.5% (7/8) while the true percentage for B is 83.33% (5/6). If N=10, then according to a binomial model (constant percentages, with each free throw result independent of any other), the probability that A wins is about 47.5%, while the probability of a tie outcome is 25.0% and the probability that B wins is 27.5%. There is a fair chance (1 in 4) of a tie, and even a slightly higher chance that the “wrong” candidate will be selected.

The preceding calculation shows the imperfection of the contest design. But free throws are cheap and quick – one should surely have N=20. Indeed, this helps. The probability that the better candidate is chosen increases, from 47.5% (N=10) to 55.9% (N=20). The probability of a tie drops to about 16.8%, while the probability that the lesser candidate is chosen does not get much smaller: it drops only from 27.5% (N=10) to 27.2% (N=20). One can get some insight into this by looking at the probabilities of some specific combinations of results from the two players’ 20 attempts. The table following shows most of what is likely to happen in such a contest: the outcomes shown have nearly 96% total probability, since neither free-throw shooter is apt to fall below 14 out of 20 attempts.

The main diagonal of the table (bold italic) shows the tied scores: 14-all, 15-all, etc. These rounded numbers already total to 16.8% (there would be scarcely any ties at scores below 14 for both players). The diagonal just below the main, shown in large bold font, shows the combinations where Player A wins by exactly 1 free throw, 15-14, 16-15, etc. These total 17.9%; but the diagonal above the main, where the lesser player wins by exactly one, is far from negligible, totalling 12.8%. A contest with N=20 helps, but there remains a substantial chance of a tie or of a victory by the weaker candidate.

One can of course go to N=40 or even N=100. For N=40, the probability of a tie drops to about 11%, while the probability of selecting the stronger candidate advances to 64.5%; there remains nearly a 25% chance of selecting the weaker candidate. For N=100, one achieves a 77% chance of selecting the stronger candidate – still, far from a sure thing. Even for N=400 there is still over 5% chance of a tie or selecting the weaker candidate. A “satisfactory” contest requires N ≈ 1000; this produces over a 99.5% chance that the stronger candidate wins (if he or she can sustain skill for 1000 free throw attempts).

These results, of course, depend on having a contest between elite free-throw shooters. In a contest where A is excellent (80%) and B is mediocre (60%), even a contest with N=10 yields over 77% chance that the stronger candidate wins, and N=40 yields almost 97%.

Our strong intuition that a

Why does one engage in probability calculations for a game of “pure skill” such as basketball? The answer is clear: someone who makes 7 out of every 8 free throws is indeed highly skilled, but her performance also varies: sometimes she only makes 6 out of 8, sometimes all 8. Even if 7 are made out of 8, which one of the 8 is missed? This cannot be predicted; probabilistic models are needed, and the binomial model works well.

Some of the skilled player’s free throws will “swish” through the hoop – not even touching the rim. Nearly every shot will at least hit the rim of the hoop. Of those that touch the rim, a majority go through, but some deflect just wrong. Basketball players’ motor programs produce very little variability – the skilled free throw shooter doesn’t miss entirely – but the slight variability is still enough to make probabilistic models the only recourse.

Planning can be very subtle, but many plans are small variations of familiar plans that have been used before. With enough time, a prospective plan can be checked in detail for flaws. Under time pressure, checking is incomplete, and again, the existence of a major flaw, which will come to light at the crunch, can be viewed as a matter of luck.

In chess, as in free-throw shooting, contests depend on a mixture of skill and luck. Thus, some sort of probabilistic modeling is needed.

In example 1, Player A has an average score of 0.55 in a 2-game mini-match versus Player B (Elo difference about 35 points). This example has a high percentage of decisive games: the probability of a drawn mini-match, which includes the probability that both games are drawn, plus probability of two White wins (and less likely, two Black wins) is only D = 41%. Example 2 has almost the same average score (0.55) and same Elo difference, but it exhibits a much lower percentage of decisive games and correspondingly higher percentage (D = 66%) of a drawn 2-game mini-match.

These models have 4 logically independent parameters, because the 5 symbolic probabilities must satisfy the equation L2 + L + D + W + W2 = 1. (In example 1, the sum is 0.999 due to rounding to 3 decimal places.) An alternative and more intuitively appealing way to specify the 4 parameters is to state the probabilities of win, draw and loss (W,D,L) for each player when that player is White. This comes to 3 probabilities per player; but only two independent parameters, since the sum of each set of 3 probabilities W + D + L must be 1.

Example 1 was calculated by assigning probabilities W,D,L = 35%, 55%, 10% when Player A has White, and W,D,L = 25%, 55%, 20% when B has White. This model assumes 45% decisive games, with overall 30% probability of a White win (averaging 35% and 25%) and overall 15% probability of a Black win (averaging 10% and 20%). Example 2 was calculated with W,D,L = 20%, 80%, 0% when A is White (A never loses to B with White!) and 10%, 80%, 10% when B is White (when B fails to draw with White, he or she is just as likely to lose as to win). In each example, the advantage of Player A over B is 10% both with White and with Black. One can also vary this assumption, formulating models where the stronger player may have an advantage mainly with White, not with Black, or the opposite. The W,D,L parameters specify the chance of a draw, with each assignment of colors, and the advantage (if any) of one player over the other, again with each assignment of colors. This alternative way of specifying parameters again has only 4 independent parameters, since W + D + L = 1 for each player’s White role.¹

My goal, in introducing this model, is not to predict the outcomes of specific two-player matches, rather, it is to

To extend this to longer matches, I carried out similar calculations of probabilities (to select the weaker or the stronger player) for matches of length 2, 4, 6, 8, 12, 16, 20, 24, 36 and 60 games, using the parameters of examples 1 and 2 and assuming stationarity and independence, as above. (This type of model is known as multinomial.) The results are shown in Table 3. Match lengths 36 and 60 are included in order to dramatize the difficulty of selecting the stronger player even with matches too long to be practical.

Three points are worth noting from this table.

In their later matches, comprising 96 games (1985-1990), there were 25 White wins, 7 Black wins, and 64 draws.

For purpose of calculation, I assume that the W, D, L parameters at the start of this Candidates match were those given in the top rows of Table 4, but after game #1, which was drawn after Aronian had failed to cash in a large advantage with White, Grischuk adopted a cautious strategy, seeking three more draws.

The initial parameters correspond to an Elo advantage of 120 points for Aronian as White and 0 difference with Grischuk as White. Had these parameters remained in effect for games 2 – 4, the chance of an overall draw would have been 34%, with the remainder split about 2:1 in favor of Aronian (44% to 22%). The assumption here is that cautious play – seeking no advantage and offering a draw with White at the first opportunity – can in fact reduce the effective Elo advantage for Aronian with White. There is no hope, long term, in such play: in a long match with these parameters, Aronian would be bound to win. But by sacrificing the hope of a win in the 4-game match, Grischuk could obtain a large probability of going over to a tie-break, which would be more favorable. (In 6 previous rapid games, all at the Amber tournaments, each had a score of +1 in his 3 White games.) in fact, in the Candidates match, Grischuk did offer a draw with White in both game # 2 (22 moves) and # 4 (17 moves), Aronian went along with this, and Grischuk did, with a bit of difficulty, attain a draw in game # 3 as well. Grischuk then won the 4-game rapid playoff (+2 =1 -1).

This series of matches produced a chorus of complaints in on-line commentary. I will mention five partially distinct criticisms: (i)

It is hard for an amateur player to quarrel with criticism (i). There may indeed be interesting nuances in the first 14-20 moves, but I don’t appreciate them unless they are pointed out, and not always then.

The present analysis undercuts criticism (ii), by showing the advantage that can accrue to a drawing strategy. As Kramnik pointed out in an interview, there is a great deal at stake in these matches, and one cannot blame a competitor for adopting a strategy that gives the best shot at winning. Also, a strong opponent can dodge ultra-caution by refusing draw offers in equal but playable positions.

The present analysis shows that, at least on a qualitative level, problems (iii) and (v) can only be reduced a bit, but not fixed by longer matches. Table 3 shows that a strong possibility of a draw or of a win by the weaker player persists even with quite long matches.

The main hope, with longer matches, would be to eliminate (iv), the reward for ultra-cautious play, and thereby also reduce the number of uninteresting drawn games. Kramnik argues, correctly I believe, that this factor would be much reduced even in 6 or 8 game matches. That, in turn, would ameliorate criticisms (i) and (ii). Kramnik argues for a double-round Candidates tournament, and this, again, would probably do much to eliminate criticisms (i), (ii), and (iv).

However, longer matches or tournaments will not eliminate the role of luck or the importance of a separate tie-break mechanism.

One might think that this is not a serious problem, since if the Black player draws game #1, then the advantage switches sides. For k = 6, how can it matter who played White first? However, this is an issue better handled by calculation rather than by intuition.

I used computer simulation to determine percentage of matches won by the player with initial White, for k = 2, 4, 6, 8, and 10. Here, the two players were assumed equal in strength, with W,D,L probabilities = 30%,60%,10% for the White player.

The interesting point is that the bias favoring first White does not disappear, even with play to 10 wins. In a match to 6 wins, the 51.3% probability is equivalent to about extra 9 Elo points per game for the player who begins with White.

Of course, some pairs of restaurants and some pairs of chess players have very clear comparisons, and the clear comparisons usually are transitive. If A is clearly better than B, and B than C, then also A will be clearly better than C. But the ordering of “clearly better than” is not complete – many pairs remain unranked.

What can one do in order to attain the much-beloved rankings? One can find some method of assigning numbers to entities, and since numbers are ordered in transitive fashion, the

The other thing to do is to admit that there is no perfect method, that the entities cannot be ranked fully, and to accept that situation. One can still have a World Champion and a Challenger, and find great entertainment value in their chess competitions, without pretending that one is better than the other, either before or after they play a match. The winner of a match is the winner, but is not thereby defined as better. Even if there were such a thing as “better” the statistical barrier (Table 3) shows that the winner of a match may very well not be better. For both statistical and conceptual reasons, it seems best to forget about better and worse and to reward players for winning as well as for beautiful games.

David Bronstein advocated this view of things, but it is associated with a notion that chess is art, rather than sport or science. I believe his view is correct, no matter whether one thinks of art, sport, or science.

From this standpoint, the main flaw in the current cycle in chess is the one pinpointed by Kramnik (among others): short matches at classical time controls encourage cautious rather than sharp games. This is exacerbated by White’s initiative at classical time controls: when White offers a draw, it is hard for Black not to accept.

The simplest solution might be to give up on the classical time control. The current Anand-Shirov match in Léon is a valuable experiment in this direction. The time control is still long enough to allow some detailed evaluation of complex plans, yet short enough to permit 2 games per day and correspondingly longer matches. This is still mostly “classical” chess – the main loss is the ability to think for an hour in the opening to deal with a dangerous novelty or with an original nuance in move ordering. It may thus place even greater emphasis on preparation – novelties may be more valuable than ever.

There seems to be a strong sentiment in the world of chess to continue the tradition that has given us 15 “real” world champions, from Steinitz to Anand. I confess that I share this sentiment, though I find it difficult to justify. The main point is that a new champion must gain the crown by winning a match against the current champion.

There are two difficulties here, both shown in Table 3. First, a match may be drawn, as Lasker-Schlechter, Botvinnik-Bronstein, Botvinnik-Smyslov, Kasparov-Karpov IV, Kramnik-Leko, and Kramnik-Topalov (in classical time control) were all drawn. Second, there is a substantial chance that the weaker of two players will win the match. The second problem can be dismissed, if we give up our belief in rankings. The winner is not necessarily better; the winner is simply Champion (or Challenger, in Candidates matches or tournaments).

The problem of a drawn match, however, will not disappear, even if it is reduced through longer matches. Here, too, the current Anand-Shirov experiment may point the way. If one can have a two-game mini-match in a single day, then it seems feasible, for the World Championship and possibly also for other very important competitions, to extend a drawn match by increments of 2-game mini-matches. There is already one day designated for tie-break; this could be changed to a potential two or three days for tie-breaks, using 2-game mini-matches at an intermediate (45 or 60 min) time control.

Notes: 1) The W,D,L parameter specification has the same count of independent parameters as Table 2, but less scope, e.g., no possible W,D,L parameters yield L2 = 50% = W2. I use the W,D,L parameters to get realistic values for Table 2. 2) The statistical model formulated in this article is a model for pairwise competition. It can be converted into an Elo difference between the two players, dependent on who has White, but these Elo differences are not necessarily the same as differences on the overall Elo scale. In fact, the model is compatible with pairwise intransitivity – A can be better than B, and B than C, yet C can be better than A. The three Elo differences cannot be mapped into differences along a uni-dimensional scale.

*By David H. Krantz*Much dissatisfaction has been voiced concerning the design of the Candidates’ matches recently concluded in Kazan (May 2011). Many seem to believe that some alternative design would better achieve the goal of selecting the “best” challenger for a match against the current champion. In this article I first present some probabilistic analyses of the results that could be expected from various possible designs for Candidates’ competition. These analyses suggest that it is impossible to find a practical design that has a high probability of identifying the “best” challenger. The penultimate section of the article questions the very concept of best challenger.

## 1. Skill and luck in free throws: A partial analogy

It is instructive to consider first a problem that is considerably simpler: selecting the better of two free-throw shooters (in basketball) by a head-to-head competition between them. The analogy has one great strength: basketball, like chess, is a game of skill, not luck; yet, as will be seen, luck is involved also.Free throws are simpler than chess for three reasons. First, the concept of “better” is pretty clear – one just means the player with the higher “true” accuracy percentage in free throws. If Player A makes 88.1% and Player B only 87.7% (these are very high, even among elite basketball players), then one declares that A is better than B. Second, there need be no interaction between the two. Player B does not guard A or defend against free throws. In particular, the comparison is inherently transitive: if A is better than B, and B better than C, it follows that A is better than C. This need not always be true in chess. Third, there is nothing comparable to the White advantage that usually prevails in high-level chess. To pass from basketball to chess, one must usually consider the 2-game mini-match, with alternate assignment of White and Black, as the competitive unit. Most designs under consideration involve double round-robin tournaments or matches with an even number of games, alternating colors. The simplifications for free-throw competition allow probabilistic issues to stand out clearly.

The obvious contest design is to have Players A and B each attempt N free throws, under some standardized conditions: the one with the higher percentage is declared the winner.

As a concrete example, suppose that the true percentage for A is 87.5% (7/8) while the true percentage for B is 83.33% (5/6). If N=10, then according to a binomial model (constant percentages, with each free throw result independent of any other), the probability that A wins is about 47.5%, while the probability of a tie outcome is 25.0% and the probability that B wins is 27.5%. There is a fair chance (1 in 4) of a tie, and even a slightly higher chance that the “wrong” candidate will be selected.

The preceding calculation shows the imperfection of the contest design. But free throws are cheap and quick – one should surely have N=20. Indeed, this helps. The probability that the better candidate is chosen increases, from 47.5% (N=10) to 55.9% (N=20). The probability of a tie drops to about 16.8%, while the probability that the lesser candidate is chosen does not get much smaller: it drops only from 27.5% (N=10) to 27.2% (N=20). One can get some insight into this by looking at the probabilities of some specific combinations of results from the two players’ 20 attempts. The table following shows most of what is likely to happen in such a contest: the outcomes shown have nearly 96% total probability, since neither free-throw shooter is apt to fall below 14 out of 20 attempts.

The main diagonal of the table (bold italic) shows the tied scores: 14-all, 15-all, etc. These rounded numbers already total to 16.8% (there would be scarcely any ties at scores below 14 for both players). The diagonal just below the main, shown in large bold font, shows the combinations where Player A wins by exactly 1 free throw, 15-14, 16-15, etc. These total 17.9%; but the diagonal above the main, where the lesser player wins by exactly one, is far from negligible, totalling 12.8%. A contest with N=20 helps, but there remains a substantial chance of a tie or of a victory by the weaker candidate.

One can of course go to N=40 or even N=100. For N=40, the probability of a tie drops to about 11%, while the probability of selecting the stronger candidate advances to 64.5%; there remains nearly a 25% chance of selecting the weaker candidate. For N=100, one achieves a 77% chance of selecting the stronger candidate – still, far from a sure thing. Even for N=400 there is still over 5% chance of a tie or selecting the weaker candidate. A “satisfactory” contest requires N ≈ 1000; this produces over a 99.5% chance that the stronger candidate wins (if he or she can sustain skill for 1000 free throw attempts).

These results, of course, depend on having a contest between elite free-throw shooters. In a contest where A is excellent (80%) and B is mediocre (60%), even a contest with N=10 yields over 77% chance that the stronger candidate wins, and N=40 yields almost 97%.

Our strong intuition that a

**contest can succeed in picking the better player is not too badly wrong, when the worse player is much less than excellent and the better one is excellent; but it fails when both are elite. In the latter case, the chance of a tie or a reversal of ordering is substantial unless the contest is very long.***practical*Why does one engage in probability calculations for a game of “pure skill” such as basketball? The answer is clear: someone who makes 7 out of every 8 free throws is indeed highly skilled, but her performance also varies: sometimes she only makes 6 out of 8, sometimes all 8. Even if 7 are made out of 8, which one of the 8 is missed? This cannot be predicted; probabilistic models are needed, and the binomial model works well.

Some of the skilled player’s free throws will “swish” through the hoop – not even touching the rim. Nearly every shot will at least hit the rim of the hoop. Of those that touch the rim, a majority go through, but some deflect just wrong. Basketball players’ motor programs produce very little variability – the skilled free throw shooter doesn’t miss entirely – but the slight variability is still enough to make probabilistic models the only recourse.

## 2. Skill and luck in chess

Chess involves pattern recognition and planning, rather than motor programs (although blitz requires manual dexterity as well). Pattern recognition is pretty reliable – grandmasters rarely overlook mate in 1. Even so, grandmasters do sometimes miss a pattern that they would usually see. If it is a bit obscure, perhaps the probability that grandmasters will see it is 7 out or 8; but the 8th instance requires a probabilistic model. If both contestants see it, fine, it goes into commentary rather than being played on the board; if both overlook it (unlikely) the game may go on as though both had seen it; but if one sees it, and the other doesn’t, the one who missed it may lose. Such a loss is “bad luck” in the sense that the loser would usually see such a pattern, but happened to miss it on this occasion.Planning can be very subtle, but many plans are small variations of familiar plans that have been used before. With enough time, a prospective plan can be checked in detail for flaws. Under time pressure, checking is incomplete, and again, the existence of a major flaw, which will come to light at the crunch, can be viewed as a matter of luck.

In chess, as in free-throw shooting, contests depend on a mixture of skill and luck. Thus, some sort of probabilistic modeling is needed.

## 3. The 2-game mini-match: Win, Draw, or Lose, with White and Black

The standard probabilistic model in chess is the Elo model, but it is not satisfactory for understanding the probabilities of various outcomes of a match. The Elo model focuses on the average score of single games between two players. It does not distinguish decisive games from draws: an average score of 0.75 can come from a probability of 75% that Player A wins, 0% draw, 25% that B wins, or, at the opposite extreme, probability 50% that A wins and 50% draw. In addition, this model does not include any consideration of who has White – it would apply to the average score when Player A is White and when Player A is Black. The simplest model that can be used to analyze a match with alternating White and Black takes as its unit the results of a two-game mini-match. For example, the model could be specified as in Table 2, using 5 probabilities with the symbolic notation (L2 , L , D , W, W2 ):In example 1, Player A has an average score of 0.55 in a 2-game mini-match versus Player B (Elo difference about 35 points). This example has a high percentage of decisive games: the probability of a drawn mini-match, which includes the probability that both games are drawn, plus probability of two White wins (and less likely, two Black wins) is only D = 41%. Example 2 has almost the same average score (0.55) and same Elo difference, but it exhibits a much lower percentage of decisive games and correspondingly higher percentage (D = 66%) of a drawn 2-game mini-match.

These models have 4 logically independent parameters, because the 5 symbolic probabilities must satisfy the equation L2 + L + D + W + W2 = 1. (In example 1, the sum is 0.999 due to rounding to 3 decimal places.) An alternative and more intuitively appealing way to specify the 4 parameters is to state the probabilities of win, draw and loss (W,D,L) for each player when that player is White. This comes to 3 probabilities per player; but only two independent parameters, since the sum of each set of 3 probabilities W + D + L must be 1.

Example 1 was calculated by assigning probabilities W,D,L = 35%, 55%, 10% when Player A has White, and W,D,L = 25%, 55%, 20% when B has White. This model assumes 45% decisive games, with overall 30% probability of a White win (averaging 35% and 25%) and overall 15% probability of a Black win (averaging 10% and 20%). Example 2 was calculated with W,D,L = 20%, 80%, 0% when A is White (A never loses to B with White!) and 10%, 80%, 10% when B is White (when B fails to draw with White, he or she is just as likely to lose as to win). In each example, the advantage of Player A over B is 10% both with White and with Black. One can also vary this assumption, formulating models where the stronger player may have an advantage mainly with White, not with Black, or the opposite. The W,D,L parameters specify the chance of a draw, with each assignment of colors, and the advantage (if any) of one player over the other, again with each assignment of colors. This alternative way of specifying parameters again has only 4 independent parameters, since W + D + L = 1 for each player’s White role.¹

## 4. Alternating-color matches of length greater than 2

One can extend the preceding model to matches with a fixed even number of games, such as 4, 6, etc., by assuming that a match with games consists of a succession of two-game mini-matches, and that the principles of**stationarity**(constant parameters) and**probabilistic independence**extend across these mini-matches. Thus, each 2-game mini-match yields results with probabiliites given by a 4-parameter model, as in the preceding section; these 4 parameters remain**constant**across the mini-matches, and the probability of a particular sequence of mini-match outcomes is obtained simply by multiplying the probabilities of the successive outcomes. For example, in a 6-game match, the probability of the successive two-game outcomes 1:1, 1:1, and ½ :1½ (as recently occurred the 2011 Candidates final between Grischuk and Gelfand) would be the product D × D × L, or (using parameters from Example 2 in Table 2), (.66 (.66) (.08) = .035 (3½ %). The probability of a 6-game match with a only one decisive game, won by Gelfand, would be calculated by summing over all three sequences of 2-game mini-matches that yield that such a result (the single decisive game could have come instead in the first or second 2-game mini-match).My goal, in introducing this model, is not to predict the outcomes of specific two-player matches, rather, it is to

**demonstrate the limitations**on the use of a match to select the better of two elite players. In Example 1 (Table 2), the probability that the stronger player will be selected by the 2-game match is only about 37%. The weaker player will be selected instead with probability about 22%, and in the remaining 41% of cases, the 2-game match will decide nothing. Example 2 yields a probability of only 26% of correctly selecting the stronger player.To extend this to longer matches, I carried out similar calculations of probabilities (to select the weaker or the stronger player) for matches of length 2, 4, 6, 8, 12, 16, 20, 24, 36 and 60 games, using the parameters of examples 1 and 2 and assuming stationarity and independence, as above. (This type of model is known as multinomial.) The results are shown in Table 3. Match lengths 36 and 60 are included in order to dramatize the difficulty of selecting the stronger player even with matches too long to be practical.

Three points are worth noting from this table.

**First**, longer matches lead to lower Draw probability and greater probability of a Win for the stronger player; but even a 24-game match leaves a substantial chance of a Draw and another substantial chance of a Loss for the stronger player.Historical note: During the 20th century, there were in fact 16 matches of scheduled length = 24 games at the world championship level (this includes the Karpov-Korchnoi final Candidates match in 1974). Of these, 3 ended in a Draw. The overall percentage of decisive games varied: 15 out of 21 were decisive in the 2nd Tal-Botvinnik match, while only 5 out 24 were decisive in the 1974 Karpov-Korchnoi match. Across all 16 matches, 146 out of 363, or about 40%, were decisive.

*Even matches of impractical length (36 or 60 games) would not afford high assurance that the stronger player will win.*(Note also that for true differences that are much smaller than 35 points Elo, the difficulty becomes correspondingly severe.)**Second**, longer matches reduce the probability of a drawn match, but the differences between 4, 6, and 8 games are not overwhelming. In this range, Draw probability drops from under 30% (4 games) to about 20% (8 games), under the assumptions of Example 1, or from under 50% to about 30% for the high-draw assumptions of Example 2.*The problem of finding a suitable tie-breaker will not disappear with longer Candidate’s matches.***Third**, for matches of modest length, the probability of an outright Loss for the stronger player does not decrease; it actually*increases slightly*for 4, 6, or 8 game matches as compared with a 2-game mini-match. This may seem counter-intuitive, but it can be understood qualitatively from the fact that a 2-game mini-match can be lost only by 0:2 or ½ : 1½; but a 4-game match can be lost via two distinct main paths. The stronger player can lose the first 2-game mini-match and then fail to catch up, or can draw the first 2-game mini-match and lose the second one. (There is also a third path – the stronger player can win the first mini-match by 1 point, 1½ : ½ and then lose the second one, 0:2 – but this is an unlikely path.) Only for long enough matches does probability of Loss for the stronger player decrease again (eventually approaching 0).## 5. Comments on particular interesting matches

The model presented above can be illustrated usefully by some recent and not-so-recent matches.**Karpov-Kasparov I (1984-5): A demonstration of non-stationarity**This was an unlimited match, thus, calculations such as those in Table 3 do not apply. This match does, however, illustrate how the assumption of stationarity of W,D,L parameters can fail. In the first 9 games, Karpov attained a lead of 4:0, with 3 White wins, 1 Black win, and 5 draws. All the draws but one ( game 8 ) were contested sharply. Kasparov’s confidence was shaken and Karpov seemed likely to finish the match quickly by attaining 6 wins. To delay this, and to recover confidence, Kasparov embarked on a drawing strategy: avoiding sharp, risky lines and, with White, offering a draw as soon as Black had equalized (overcome White’s initial pressure). The 10th through 26th games were drawn, with the even-numbered ones (Kasparov as White) agreed drawn, mostly on White’s proposal, and mostly after 22 or fewer moves. The W,D,L parameters for Kasparov’s White games during this phase were close to 0% Win, 100% Draw, 0% Lose. Of course, these parameters are produced by the combination of both players’ decisions: Karpov had to agree to the draws that Kasparov offered.In their later matches, comprising 96 games (1985-1990), there were 25 White wins, 7 Black wins, and 64 draws.

**Aronian-Grischuk (2011 Candidates)**The Elo ratings for these players at the time of the match were respectively 2808 and 2747, i.e., the estimated difference was a bit over 60 points. The Elo model gives an average score of about 0.58 per game played when the true difference is about 60 points. This might correspond roughly to an Elo difference of 120 points when Aronian is White and 0 points (evenly matched) when Grischuk is White. This at least not refuted by the record of their games with classical time controls*prior*to the Candidates match: Aronian with White won 4, drew 6, and lost 0, while Grischuk with White won 2, drew 4, and lost 2.For purpose of calculation, I assume that the W, D, L parameters at the start of this Candidates match were those given in the top rows of Table 4, but after game #1, which was drawn after Aronian had failed to cash in a large advantage with White, Grischuk adopted a cautious strategy, seeking three more draws.

The initial parameters correspond to an Elo advantage of 120 points for Aronian as White and 0 difference with Grischuk as White. Had these parameters remained in effect for games 2 – 4, the chance of an overall draw would have been 34%, with the remainder split about 2:1 in favor of Aronian (44% to 22%). The assumption here is that cautious play – seeking no advantage and offering a draw with White at the first opportunity – can in fact reduce the effective Elo advantage for Aronian with White. There is no hope, long term, in such play: in a long match with these parameters, Aronian would be bound to win. But by sacrificing the hope of a win in the 4-game match, Grischuk could obtain a large probability of going over to a tie-break, which would be more favorable. (In 6 previous rapid games, all at the Amber tournaments, each had a score of +1 in his 3 White games.) in fact, in the Candidates match, Grischuk did offer a draw with White in both game # 2 (22 moves) and # 4 (17 moves), Aronian went along with this, and Grischuk did, with a bit of difficulty, attain a draw in game # 3 as well. Grischuk then won the 4-game rapid playoff (+2 =1 -1).

**The 2011 Candidates matches taken as a whole**Cumulatively, there were 30 games at classical time controls, and they yielded only 1 White win, 2 Black wins, and 27 draws. A reasonable prior expectation would be proportions similar to those in the last four Karpov-Kasparov matches: 26% White wins, 7% Black wins, and 67% draws. The deviation from expectation is large. The Pearson chi-squared test statistic is about 8.3; a value that high or higher would be expected with probability < 2% if the deviations from expectation were governed by chance only. Four out of the six 4-game matches were drawn (including Aronian-Grischuk, as described above) and were resolved by rapid or blitz tiebreakers.This series of matches produced a chorus of complaints in on-line commentary. I will mention five partially distinct criticisms: (i)

**Uninteresting**: the large number of short draws detract from spectator and subsequent reader and historical interest. (ii)**Unsporting**: the short draws suggest lack of serious effort. (iii)**Excessive role for luck**: the “best” challenger has a poor chance of winning, because too much depends on luck in short matches. (iv)**Excessive reward for ultra-cautious play by the weaker player**: In longer matches, an ultra-cautious strategy, such as the one I attributed to Grischuk, would have little chance of success, and therefore would not be used. (v)**Departure from “classical” chess**: the winners are determined by methods that are not valid indicators of superiority in slow-play chess; just as in point (iii), the “best” challenger may not win.It is hard for an amateur player to quarrel with criticism (i). There may indeed be interesting nuances in the first 14-20 moves, but I don’t appreciate them unless they are pointed out, and not always then.

The present analysis undercuts criticism (ii), by showing the advantage that can accrue to a drawing strategy. As Kramnik pointed out in an interview, there is a great deal at stake in these matches, and one cannot blame a competitor for adopting a strategy that gives the best shot at winning. Also, a strong opponent can dodge ultra-caution by refusing draw offers in equal but playable positions.

The present analysis shows that, at least on a qualitative level, problems (iii) and (v) can only be reduced a bit, but not fixed by longer matches. Table 3 shows that a strong possibility of a draw or of a win by the weaker player persists even with quite long matches.

The main hope, with longer matches, would be to eliminate (iv), the reward for ultra-cautious play, and thereby also reduce the number of uninteresting drawn games. Kramnik argues, correctly I believe, that this factor would be much reduced even in 6 or 8 game matches. That, in turn, would ameliorate criticisms (i) and (ii). Kramnik argues for a double-round Candidates tournament, and this, again, would probably do much to eliminate criticisms (i), (ii), and (iv).

However, longer matches or tournaments will not eliminate the role of luck or the importance of a separate tie-break mechanism.

## 6. Unlimited matches to k wins

Some famous players have advocated these, notably Capablanca (k = 6) and Fischer (k = 10). It seems doubtful that these ideas will make a comback in our day. However, in case anyone is hankering romantically after the good old days, I want to point out a purely chessic drawback of unlimited matches: the unearned and unfair advantage accruing to the player with White in game 1 (assuming that the colors alternate thereafter).One might think that this is not a serious problem, since if the Black player draws game #1, then the advantage switches sides. For k = 6, how can it matter who played White first? However, this is an issue better handled by calculation rather than by intuition.

I used computer simulation to determine percentage of matches won by the player with initial White, for k = 2, 4, 6, 8, and 10. Here, the two players were assumed equal in strength, with W,D,L probabilities = 30%,60%,10% for the White player.

The interesting point is that the bias favoring first White does not disappear, even with play to 10 wins. In a match to 6 wins, the 51.3% probability is equivalent to about extra 9 Elo points per game for the player who begins with White.

## 7. Can chess players be rank ordered?

People love rankings. However, the entities being ranked are usually multi-dimensional. Restaurant A has better meat dishes than B, but fish dishes are better at B; desserts are also better at B, but service is more professional at A. Of course, each of these dimensions can be broken down further. Service involves introductions, making suggestions, taking orders, etc. Ranking restaurant A versus B depends on the weightings for the various factors. These weightings are of course a matter of individuals’ tastes but may not even be stable for an individual: the weight on a given dimension may change depending on whether the differences observed along that dimension are small or large. A large difference not only matters, but swells the importance weight for that dimension. A critic might rate A better than B, on the basis of certain dimensions, and B better than C, on the basis of somewhat different dimensions, yet also rate C better than A, based on the dimensions that are most salient in that comparison. Likewise in chess, Aronian performs well against Anand, and Anand likewise against Kramnik, yet Kramnik seems stronger than Aronian, head-to-head.Of course, some pairs of restaurants and some pairs of chess players have very clear comparisons, and the clear comparisons usually are transitive. If A is clearly better than B, and B than C, then also A will be clearly better than C. But the ordering of “clearly better than” is not complete – many pairs remain unranked.

What can one do in order to attain the much-beloved rankings? One can find some method of assigning numbers to entities, and since numbers are ordered in transitive fashion, the

**rankings are simply defined in terms of numbers**. This is a frequent ploy. The numbers may come from subjective ratings (Zagat), from fitting a mathematical model to many different contests (Elo), from someone’s ratings of relative importance of different items (Cost-of-Living Index). The numbers satisfy a need only until something goes wrong – a comparison turns out different from what was expected. Then one wants to revise the method.²The other thing to do is to admit that there is no perfect method, that the entities cannot be ranked fully, and to accept that situation. One can still have a World Champion and a Challenger, and find great entertainment value in their chess competitions, without pretending that one is better than the other, either before or after they play a match. The winner of a match is the winner, but is not thereby defined as better. Even if there were such a thing as “better” the statistical barrier (Table 3) shows that the winner of a match may very well not be better. For both statistical and conceptual reasons, it seems best to forget about better and worse and to reward players for winning as well as for beautiful games.

David Bronstein advocated this view of things, but it is associated with a notion that chess is art, rather than sport or science. I believe his view is correct, no matter whether one thinks of art, sport, or science.

## 8. Recommendations for the championship cycle

One path to follow would be to give up on the mystique and the privileges of the World Champion. This is already the path followed in tennis, in golf, and in Women’s chess. The focus is on tournaments and matches. The current leaders may have certain privileges – automatic entry to particular competitions – but not the kinds of privileges that the chess champions from Steinitz to Anand have held. Tennis, golf, and women’s chess rankings are certainly useful as a tool in assembling invitation lists, but they need not be taken seriously as indicating who is truly better or worse. It does not disturb anyone to have one player better than another on clay tennis courts, the other better on grass courts.From this standpoint, the main flaw in the current cycle in chess is the one pinpointed by Kramnik (among others): short matches at classical time controls encourage cautious rather than sharp games. This is exacerbated by White’s initiative at classical time controls: when White offers a draw, it is hard for Black not to accept.

The simplest solution might be to give up on the classical time control. The current Anand-Shirov match in Léon is a valuable experiment in this direction. The time control is still long enough to allow some detailed evaluation of complex plans, yet short enough to permit 2 games per day and correspondingly longer matches. This is still mostly “classical” chess – the main loss is the ability to think for an hour in the opening to deal with a dangerous novelty or with an original nuance in move ordering. It may thus place even greater emphasis on preparation – novelties may be more valuable than ever.

There seems to be a strong sentiment in the world of chess to continue the tradition that has given us 15 “real” world champions, from Steinitz to Anand. I confess that I share this sentiment, though I find it difficult to justify. The main point is that a new champion must gain the crown by winning a match against the current champion.

There are two difficulties here, both shown in Table 3. First, a match may be drawn, as Lasker-Schlechter, Botvinnik-Bronstein, Botvinnik-Smyslov, Kasparov-Karpov IV, Kramnik-Leko, and Kramnik-Topalov (in classical time control) were all drawn. Second, there is a substantial chance that the weaker of two players will win the match. The second problem can be dismissed, if we give up our belief in rankings. The winner is not necessarily better; the winner is simply Champion (or Challenger, in Candidates matches or tournaments).

The problem of a drawn match, however, will not disappear, even if it is reduced through longer matches. Here, too, the current Anand-Shirov experiment may point the way. If one can have a two-game mini-match in a single day, then it seems feasible, for the World Championship and possibly also for other very important competitions, to extend a drawn match by increments of 2-game mini-matches. There is already one day designated for tie-break; this could be changed to a potential two or three days for tie-breaks, using 2-game mini-matches at an intermediate (45 or 60 min) time control.

Notes: 1) The W,D,L parameter specification has the same count of independent parameters as Table 2, but less scope, e.g., no possible W,D,L parameters yield L2 = 50% = W2. I use the W,D,L parameters to get realistic values for Table 2. 2) The statistical model formulated in this article is a model for pairwise competition. It can be converted into an Elo difference between the two players, dependent on who has White, but these Elo differences are not necessarily the same as differences on the overall Elo scale. In fact, the model is compatible with pairwise intransitivity – A can be better than B, and B than C, yet C can be better than A. The three Elo differences cannot be mapped into differences along a uni-dimensional scale.

David H. Krantz is Professor of Psychology and Statistics at Columbia University in New York. He grew up in Buffalo, New York, and was a fairly serious chess player in high school, 1952-1956, before the era of ratings. He lost a game to Reshevsky in a simultaneous exhibition around 1953 (exact date has been forgotten). His academic work has been mathematical psychology, specializing first in visual perception, later in probabilistic reasoning, and currently in decision making. He is a founding director of the Center for Research on Environmental Decisions at Columbia University and he teaches a course at Columbia called Introduction to Statistical Modeling in Psychology.