Wait a minute here! I claimed that the formula and the probability tables are not based on real games; I said they were independent of them, and I think of them as designed to be applied to real games. I said that the player's ratings are based on their real games, but the formula used to determine their ratings was independent of the real games. I agree with this redundancy idea (it's the way ratings people basically say, "Well, if I'm wrong, I won't be wrong for long!"), but I can't see what you claim is my misconception.
If the FIDE formula were based on real games that had occurred first, then these must be recognized as the most important games in the history of FIDE chess ratings! What games were these? Who were the players? What events were these?
Whether the probabilities or ratings are the real statistical basis, you haven't shown me which real games the formula is based upon.
And I still disagree with your claim that "the only objective measure that carries any predictive value is ratings." No, I think game history has some predictive too. Do you think game history between two players has zero predictive value?
"Predictive value," I meant to say in that penultimate sentence.
"Do you think game history between two players has zero predictive value?"
Of course, but we don't have that info for Anand vs Kosteniuk. The only predictive information we have are their ratings.
"I claimed that the formula and the probability tables are not based on real games; I said they were independent of them, and I think of them as designed to be applied to real games. I said that the player's ratings are based on their real games, but the formula used to determine their ratings was independent of the real games. ... If the FIDE formula were based on real games that had occurred first, then these must be recognized as the most important games in the history of FIDE chess ratings! What games were these? Who were the players? What events were these?"
There's still some misconception here, otherwise you wouldn't be claiming that the prediction table under-estimates or over-estimates. FIDE's prediction table basically tells you: "we predict that this person will perform at the same level as he/she has been performing in recent games." That's all they are telling you. The effect of rating cancels out.
I hear your point: that ratings/formulae are independent of the games. But the misconception is that you are using this point to argue that the predictions are under-estimating one's chances.
Think of it this way: suppose you score 17% against Anand. FIDE will give you a rating of 2517 by definition. Now, when you ask them for a prediction, they'll say: "You have a rating of 2517, 'therefore' you will score 17% against Anand." They are basically doing this:
1) Using your game history to generate an artificial number
2) Using this number to regurgitate back out your game history (sounds pointless, right? but continue...)
3) Predicting that history = future, and therefore you will perform at whatever level you performed at to reach you current rating.
They are simply using this number to record your game history. In fact, these numbers have very little meaning unless they are taken in the context of the prediction table. Every rating system comes with a prediction table. Without the table, ratings are nearly meaningless. Like I said, all they are telling you is that you will do about as well as you did in your previous games.
I don't see anything obviously biased with that prediction. Maybe in select cases, when a person is rapidly improving, their performance might be higher than their past records suggest. But overall, how can you say that the prediction is biased?
Everything you've said about how the numbers are generated applies equally well to the Elo system. And yet the Elo system definitely underpredicts the performance of lower-rated players.
jkpastorius, the Elo system made a very incorrect assumption about the distribution of one's performance. FIDE's system corrects for that. You've quoted that source about Elo ratings 3 times already, yet your point is to show that the FIDE system under-predicts the weaker player's score.
Like I said, FIDE's prediction is essentially that you will perform at about the same level as you've done in past games. This isn't perfect, and there are parameters that can be adjusted to yield slightly different predictions, but what it is about this system that you find biased in a certain direction?
Right now, your reasons are:
1) The Elo system is biased, so probably FIDE's system is biased as well
2) I believe it's biased (based on what? your intuition?)
In other words, if you have a 2517-rated friend who's about to face Anand, and is intimidated by FIDE's 17% predicted score. What will you tell them to convince them that the most likely score should be higher? Don't use single examples (we can find just as many counter-examples). How can you convince them that there's a consistent and statistically significant bias in FIDE's system?
It's strange, because a few posts ago you readily agreed about 8-2 being the most likely score. Now, you've changed your mind?
OK, now you're being unfair. "A few posts ago," I went along with your reasoning -- setting aside my own objections. Now, you've held me to my concessions. But I went along with them to see what we'd get from the numbers. Unfortunately, we don't have the probability of Kosteniuk going 4-6. Oh well. But to continue on this path about "what's most likely? what's most likely? what's most likely?" ....
You're right about the FIDE corrections/improvements. But they don't come from real games; they are independent of real games.
What would I tell this friend about the most likely score? Around 17% I suppose, but as with Kosteniuk, I'd add that I think they have a reasonable chance (which we haven't calculated) of significantly out-performing their rating, e.g. winning 4 instead of 2 out of 10. [This might be bolstered if my friend had just split a pair of blitz games with both Anand and Carlsen: http://www.youtube.com/watch?v=QWmk2XbLaKM]
You keep asking about the "most likely" event, while I'm interested in establishing a reasonable cause for belief in out-performing, i.e. being under-predicted by the tables, etc.
You said "of course" to my question about game history having zero predictive value. Or did you mean "of course it DOES have predictive value"?
I think one can believe X (e.g. Anand gets the first 4 out of 5), and also believe Y (e.g. Anand gets the second 4 out of 5), and yet not believe the conjunction of the two beliefs (Anand gets 8 out of 10). Now, I actually do think 8/10 is the "most likely" result, but I don't think I'm committed to believing that on the grounds of X and Y. Maybe this is our difference. You might hold that believing X and believing Y commits us, logically speaking, to believe in the conjunction "X and Y."
(You don't buy lottery tickets, do you?)
I mean that game history does have predictive value. In fact, FIDE's prediction relies on game history.
"You're right about the FIDE corrections/improvements. But they don't come from real games; they are independent of real games."
But who's to say they are over- or under-estimating the values? Without further information, how can we lean in one particular direction? Single-case anomalies aren't convincing (and in fact, 2 of your examples show an inflated score for the higher-rated player, Kasparov). If the weather forecast tells me 60% chance of rain, I might wonder about how precise that estimate is. I might interpret that prediction as 60% give or take. But I can't lean in one direction and claim that the 60% is too high, not without more information. If I have to guess, I might as well flip a coin. In short, we can't lean in one direction unless there's something concrete to back that up. I'll take that 60% with a grain of salt, but I've no reason to say that it's wrong in a certain direction.
Earlier on you brought up an interesting example of four 2750's vs 2500's. I just realized that we can apply this to the hypothetical Kosteniuk-Anand 10-game match. I did a calculation (can be easily confirmed--this is actually just a binomial distribution). Assuming 83% winning chances for Anand in each game (and 17% losing chances, so ignoring draws, exactly the same as in your example), the chances of Kosteniuk scoring 4/10 is 5.7%. The combined chances of her scoring 4/10 or more is 7.4%. In both examples, I'm actually not too surprised. In your initial example, it's perfectly reasonable to score 1/4, since the prediction is 1/5, so no surprises there. Obviously we've simplified it a lot, but since we've used this on that 2750-vs-2500 example, I thought let's just try this on the Kosteniuk-Anand case.
Now this (5.7%) is helpful -- lower than I would've guessed, but helpful all the same.
I also think there is more than one way in which we've been talking past each other. The first way had to do with the issue of "what's most likely" -- we agree Anand wins 8-2, and that's what I've said I'd put my money on if I had to bet on a score. [This is in the absence of game history stats between the two -- if we take the blitz games as some kind of admittedly weak indicator, she's 50% recently, although in fairness I must give an update on the World Blitz Championship: Carlsen won, Anand was 2nd (3 points back and 3 points ahead of 3rd), and Kosteniuk was dead last, yikes.] In saying this, I've been basically agreeing with the predictive value of the (FIDE) ratings system.
Now you've been asking about the assymetry in my claims -- i.e. why the underprediction claim without the corresponding overprediction claim that is equally likely on the ratings model? For betting purposes, I would have to consider the two equally likely.
However, for the purposes of justifying the claim that Kosteniuk could indeed go 4-6 against Anand in a match, I don't need to do the same. I take it that if Kosteniuk were to ever do this even once, she will have proven herself capable of doing this, i.e. we would all agree that Kosteniuk could go 4-6 against Anand. [That may be a truism, but for clarity's sake, I felt I needed to state it.] This might be a second way in which we've been talking past each other, with you focusing on "betting" strategy and me focusing on "eventual proving" strategy. Suppose we're asking the question: Could Hikaru Nakamura ever win the world championship? (The real one, not 960/Fischer Random.) Suppose he makes the attempt every year for 19 years, failing each time. He wins it once (1 in 20 tries = 5% success rate), thereby proving that he could indeed win the world championship. Those who insisted it was realistic that he'd win it one year (not knowing when, and never betting on him in any year) are not lucky in their oveall assessment of his chances -- rather, the probability of the event occurring once eventually was much higher than for it occurring in a given year. Simple probability, in terms of "[event occurring]/[opportunities for the event occurring]" would be 1/20 = 5%. Not a high probability at all. Which time/year would I have bet on him to win it? None. But will he have established beyond doubt that he could win it? Yes. Something parallel applies with Kosteniuk, or any player, but the strength of the player (especially relative to particular opponents) makes the probability higher or lower. The reason I brought up the example of the male and female GMs playing 4 games was to establish that probably one of them would result in a draw or win ("holding her own"), even though we might be foolish in terms of probability to put money on any of the games, or on any such matches, given a big ratings difference. You could say that this idea applies to anyone, even sub-1000 players, and in principle this is true. But as I said with the Fischer-Larsen case, there's a big difference in degree here. It would take an unrealistically high number of games/matches before a weak player even draws once against a GM; not so with the Kosteniuk-Anand match. Apparently she'd likely do it in less than 20 chances, not so unrealistic. [Hey, now all we need is a rule forcing the (men's) world champion to play the women's world champion in a world championship match, and chances are good that Kosteniuk could pull off what some here think is impossible! (Kidding, but if anyone wants to take that seriously, I'd beg them to do it in a different forum/discussion.)]
Maybe (your answer to) this question will clarify something -- not about Anand and Kosteniuk, but about the predictive value of ratings versus other factors: If player X is rated 2517, and player Y is rated 2788, it would seem that Y is a heavy favorite. But if their game history were heavily imbalanced in favor of the lower-rated player (who won about 80% of the time), should Y still be considered such a heavy favorite? I don't even mean to imply that X is underrated in an overall sense, or that Y is overrated in an overall sense, nor even that X some kind of "giant killer" who plays better against 2700s generally while losing to 2300s. X may simply match up well against Y. Again, I don't bring this up to justify claims about Kosteniuk directly, but I'm asking this question to see if we disagree about criteria and the predictive value of it.
Join Chess.com for free to add your comment! Already a member? Then login now to comment.