Assessing players' performance in a single match

Sort:
Avatar of blitzjoker

When I said that a rating for a particular match would not be meaningful, I meant that it would only give a rough approximation of player strength, as everyone plays at a slightly different level in each game.  Clearly you would need to average over a few games to get a good estimate of a player's strength.

However, I'd go as far to say that in a year or two a chess program like Stockfish would be able to assign a very accurate ELO rating to players based on their games if the programmers decided to include such a feature (which someone will, as it is interesting).  It would just need to assign scores to candidate moves based on an algorithm of some sort.  It could analyse games played with participants with ELO ratings for comparison.

Incidentally kleelof, if you think a good chess player could evaluate a player's ELO with some accuracy, why do you think a computer with a much higher rating could not do the same?

Avatar of fabelhaft

"You can only play against the opponent before you, fabelhaft. If Fischer demolished his opponent then surely it is possible to say he played well?"

Didn't I say exactly that? But to state that Byrne played 2675 level chess in the game, as the analysis in the linked article claims, sounds very high. This was a player that would become IM first years later and that played so badly as to get a lost position with white very early in the game. Spassky was 2660 when he played the match against Fischer.

Avatar of ColonelKnight

Blitzjoker, any assessment the computer makes would have to be based on a huge amount of data to peg the players at some specific level. Left alone to its devices, software can only comment on things like inaccuracies and blunders looking at so many moves down in how that game is evolving.

By the way, there is a forum thread on black magic to control husbands.

Avatar of blitzjoker
ColonelKnight wrote:

Blitzjoker, any assessment the computer makes would have to be based on a huge amount of data to peg the players at some specific level. Left alone to its devices, software can only comment on things like inaccuracies and blunders looking at so many moves down in how that game is evolving.

 

By the way, there is a forum thread on black magic to control husbands.

Indeed, and that data is readily available, and computers are good at managing huge amounts of data.

Avatar of ColonelKnight

Oh, I just meant that you'd have to source that data from somewhere. That'd be the issue. Of course, free software like R can help you analyze data once you have it. My point was getting the data. CdotC could charge big time for that.

Avatar of kleelof
blitzjoker wrote:

 

Incidentally kleelof, if you think a good chess player could evaluate a player's ELO with some accuracy, why do you think a computer with a much higher rating could not do the same?

Good question.

A human and a computer look at positions is much different manners.

A human player, in general, uses experience as a major factor in their calculations. A computer, on the other hand, only calculates. A computer never thinks "Morphy was in a similar position and made this move."

A human player has a sense of skill based on their experience. A computer has no such sense. In the end, it's calcualations are based on hard cold numbers.

It is this sense of skill that allows a human to build a context around a game or set of moves. This context allows them to determine, with some reasonable accuracy, the skill involved in a game or set of moves.

Since a computer has no sense of skill, it, for the most part, can only determine if one move is technically better than another. Once the move is over, the computer no longer considers it, so no context can be built around an over all game or set of moves.

It is this difference that allows an experienced chess player to look at a game or set of moves and say that they have qualities of a specific or range of ratings.

When considering these ideas, you must remember - Elo is not a measure of skill. Elo is system meant to rank people based on winning and losing only. Therefore, it can only be applied to a specific pool of players.

I've seen many times people post how they beat some computer software with an Elo of 1500 or some other impressive number and brag that they are only 1200. This is because the human player is getting their rating from a different pool than the artificial rating placed on the computer's play.

I hope I am making my point clear. This is a very complex subject and it is made even more complex by the fact that most people don't really understand what Elo is, how it works and what it actually measures.

BTW - I would like to point out that it is Elo, not ELO. It is named after Arpad Elo (http://en.wikipedia.org/wiki/Arpad_Elo). ELO, On the other hand, is a band from Birmingham England. And, as far as I know, they have made no contributions to the world of chess.Laughing

Avatar of blitzjoker

There is plenty of data around for top players; maybe for under 2000 it might be trickier, but I daresay there is still enough to have a good stab at it already.

In fact it sounds an interesting project.  I did read some research a while ago looking at estimating historic ELOs.  That sounds entirely possible, and I expect we'll see it in the next few years.  Just think of all those threads we'll miss when Stockfish finally decides whether Fischer is better than Carlsen, or Capablanca, or Morphy.  And it could settle the question of whether there is ELO inflation or not. Smile

Avatar of ColonelKnight

Computers can mimic human evaluation by examining patterns in data and pegging moves to skill level. Humans cannot beat the bell curve! And OP's need for objective evaluation does seek his deviation from the mean! Whopee do.

Avatar of ColonelKnight

@blitzjoker, much as I'd like to see those threads go away, stats is bad at dealing with outliers by the very definition of the "normal curve", so, no, don't expect a definitive answer on Carlsen vs. Capablanca.

Cool project though for math brains!

Avatar of blitzjoker

Apologies to Mr Elo.

I suppose if we started again without Elos, then ratings based on computer assessments might be the way to go as in theory they could be based round a stable benchmark. 

Any computer assessment now of a player's historic Elo rating would have to be based on the Elo ratings of current players, which may in general be higher (or indeed lower) than it was in the past due to the factors you mention about a pool of players.  Hence Fischer's Elo rating in today's world may be higher or lower than it was when he was actually playing.

Regarding the human capacity to look at moves in context, this is a nice concept but ultimately a perfect calculating machine will triumph in the end, and the best move will be the best move.  We're not there yet, but (sadly perhaps) it is not far enough off to make much difference.

ELO were never as good after Roy Wood left I think; they peaked with the 10538 Overture in my mind, but I know I'm pretty much out on my own there. Smile

Avatar of blitzjoker
ColonelKnight wrote:

@blitzjoker, much as I'd like to see those threads go away, stats is bad at dealing with outliers by the very definition of the "normal curve", so, no, don't expect a definitive answer on Carlsen vs. Capablanca.

 

Cool project though for math brains!

Actually I'm a mathematician and data scientist when I'm not playing third rate chess games.

Avatar of kleelof
blitzjoker wrote:

Regarding the human capacity to look at moves in context, this is a nice concept but ultimately a perfect calculating machine will triumph in the end, and the best move will be the best move.  

Sure, that is correct.

But we were discussing the ability to evaluate a game or set of moves and assign it an Elo value that is credible in a pool of players. Not the abillity of computers to crush players.

In theory, a chess software can have an Elo if it goes to tournaments and plays enough games to accumulate a rating. And, absolutely no doubt, most of them would attain a great Elo. 

But, in the end, all those moves in all those games would just be a long string of calculated values with no basis in skill.

But, as you say, it is not outside the rhelm of possibility that, at some point in the near or far future, computers will be able to develop a sense of skill and the ability to evaluate play in a way that an experienced human can.

But I still won't play with them.Laughing

Avatar of ColonelKnight

Stephen Hawking also hates AI.

Avatar of VLaurenT

There are at least two things I can think of that humans do much better than computers when evaluating games :

  • a computer doesn't know how difficult a variation is : it doesn't make the difference between taking a piece (+3, thank you) and finding a 7 moves combo that wins a piece (+3, darn !) ; for less experienced players, it doesn't make the difference between hanging a piece (putting your Knight en prise !) and walking into a 7-moves difficult combo losing a piece (-3 all the same)
  • a computer doesn't appreciate how you manage at inducing your opponent into making mistakes : it will always prefer a +0.11 eval which leads to a certain draw against a lesser rated player, over a -0,5 eval which gives you good winning chances...
Avatar of VLaurenT
blitzjoker wrote:

 

I suppose if we started again without Elos, then ratings based on computer assessments might be the way to go as in theory they could be based round a stable benchmark. 

(...)
 
Depends on what you want to measure : objective 'correctness' of moves played, or ability to outplay opponents. At the moment, computers can't measure the second.
Avatar of kleelof
ColonelKnight wrote:

Stephen Hawking also hates AI.

I don't really have a problem with AI. I'm a programmer, and, I imagine, like most programmers, I have an interest in AI.

I just don't like to play chess with computers.

To me, part of the 'charm' of chess is human error. Finding an error, preferably, in my opponent's play or, to my displeasure, in my own play.

At my current level of play, a computer has to either intentionally make a mistake or play below its ability. In either case, it equates to patronizing.

I leave patronizing me to my wife.Laughing

Avatar of ColonelKnight

Matching wits with you is also best left to the missus :). The Hawking thing is the latest bit of web trivia ... huff post 2 days back ... http://www.huffingtonpost.com/2014/05/05/stephen-hawking-artificial-intelligence_n_5267481.html

kleelof wrote:

ColonelKnight wrote:

Stephen Hawking also hates AI.

I don't really have a problem with AI. I'm a programmer, and, I imagine, like most programmers, I have an interest in AI.

I just don't like to play chess with computers.

To me, part of the 'charm' of chess is human error. Finding an error, preferably, in my opponent's play or, to my displeasure, in my own play.

At my current level of play, a computer has to either intentionally make a mistake or play below its ability. In either case, it equates to patronizing.

I leave patronizing me to my wife.

Avatar of kleelof

Funy article.

I think he is just doing that so if it DOES go the way he is saying, he will look like a hero.

Stephen Hawking does have a bit of an ego.Cool