DB Quality?

Sort:
Avatar of NimzoRoy

The quote below was made TO ME, NOT by me!

I sure don't agree with it, but I'm curious to see what other players think...The quote here is from a discussion of using a homemade DB consisting solely of games from LSS, IECG, etc with probably way less than one million games as opposed to something like ChessBase BIG DB 2012 with several million games

"My point is that CC games will always be of higher quality thanks to engine use (which is supported on the sites I mentioned [ie LSS, IECG, etc]) and long time controls which combined make the lowliest amateur as strong as a GM. Therefore adding "normal" games can only make the qualiy [of a DB] lower."

[by "normal games" I presume he means OTB games from DBs such as Chess Base, Chess Assistant etc]

Avatar of andrewlong

Well, there's the obvious arguments like engines don't necessarily make better long term strategic decisions in the opening, so a master is more likely to find the best continuation/novelty move. There is some point to what he is saying -- a database of GM-centaur games would be better than a regular GM database. Add variables like low level players who misuse a computer, low level players who don't use a computer, etc., and I don't trust it though.

I'm a little confused about the phrase "high-quality" database though. Its not like a high-quality engine vs a low-quality engine which you can pit against each other and see who wins or loses. The database shows percentages who wins etc. for given moves. It is likely changes in wins between two playable moves is not due to those moves but subsequent errors that determined the outcome. I suppose there can be high and low quality if you are playing lines that are obscure enough to not have established theory, but if thats the case the line is probably obscure because the game has reached equality and database lines shouldn't provide any meaningful change in advantage. Its the middle game, time to come up with a plan.

The only other thing I can think of for making it better is that it can provide better middle game play if you wander that far down a line since you are essentially playing a computer's moves still while your opponent with his shoddy GM database has been off book for many moves. However, with such a small database, it is unlikely that many games will continue into the middle game in database.

Does this person use the database just to choose moves in currently played games, or is it used to study openings, like finding novelties or determining the best opening repetoire? If the latter, there is a chance this could be helpful given more games and a strict construction of the database to only include high level games. If the former, it sounds like a person really doesn't want to play chess and would love to win by computer if they could.

Avatar of ori0

Nice answer andrew.

Avatar of Vease

Correspondence chess now is just a battle of 'who has the most powerful rig to run Rybka/Houdini'. The games are 'played' between computers, not humans. Theoretically the analysis should be almost perfect though, so playing over the games is not without value.

Avatar of Aarnos
andrewlong wrote:

Well, there's the obvious arguments like engines don't necessarily make better long term strategic decisions in the opening, so a master is more likely to find the best continuation/novelty move. There is some point to what he is saying -- a database of GM-centaur games would be better than a regular GM database. Add variables like low level players who misuse a computer, low level players who don't use a computer, etc., and I don't trust it though.

I'm the person who made the comment.

This first point is entirely correct. This is I try to have as deep an understanding of the openings I play as possible so that I can notice moves the computer misses. Playing solely with a database and a computer isn't going to get you anywhere. The point about th low-rated players is also true for normal databases. Other than that there's nothing wrong with it.

"I'm a little confused about the phrase "high-quality" database though. Its not like a high-quality engine vs a low-quality engine which you can pit against each other and see who wins or loses. The database shows percentages who wins etc. for given moves. It is likely changes in wins between two playable moves is not due to those moves but subsequent errors that determined the outcome. I suppose there can be high and low quality if you are playing lines that are obscure enough to not have established theory, but if thats the case the line is probably obscure because the game has reached equality and database lines shouldn't provide any meaningful change in advantage. Its the middle game, time to come up with a plan."

When you check all the games played on FICS, you see that the win percentage for White is close to 50. Most games played on FICS are blitz and low-quality. So here low quality has warped the percentages as the result of the game isn't dependant on the move played. If you look at a CC database you see that the win percentage for White is about 55%. Here the result of the IS dependant on the moves played. This is the difference between a low-quality database and a high-quality database - the percentages they give for each move. The more high-quality the database the closer to the "truth" we get when we look at the percentages. This is why I use a CC database, because the games have less mistakes and are therefore closer to the "truth".

"The only other thing I can think of for making it better is that it can provide better middle game play if you wander that far down a line since you are essentially playing a computer's moves still while your opponent with his shoddy GM database has been off book for many moves. However, with such a small database, it is unlikely that many games will continue into the middle game in database."

I really can't give an answer to this without writing a 1000-word essay so I'll just say that you are mostly correct.

"Does this person use the database just to choose moves in currently played games, or is it used to study openings, like finding novelties or determining the best opening repetoire? If the latter, there is a chance this could be helpful given more games and a strict construction of the database to only include high level games. If the former, it sounds like a person really doesn't want to play chess and would love to win by computer if they could."

This person uses it for the latter purpose.

Avatar of Aarnos

Also @Vease

You are far from correct. Eros Riccio, a top 20 player in the world uses an old quad.

Avatar of pfren
Vease wrote:

Correspondence chess now is just a battle of 'who has the most powerful rig to run Rybka/Houdini'. The games are 'played' between computers, not humans. Theoretically the analysis should be almost perfect though, so playing over the games is not without value.

I'm sorry to say that you are completely wrong.

Avatar of andrewlong

First, excellent avatar image. Second, back to the high-quality discussion, in your opinion, how much does that true percentage matter for decision making purposes? Perhaps later on in the opening when theory is less clear cut I can see a database showing a move that is slightly better alowing, for instance, white to keep the opening advantage a bit longer, but if white is 50% in one database with 1.e4, but more truely 55% as shown in another database, that doesn't really mean that a person should choose 1.e4 since there is so much more involved in the opening.

Anyway, I am a database user so my opposition is of course hypocritical or devil's advocacy, but I think a this database idea is a good one when coupled with opening study so, again for instance, I don't play a line that is objectively better but goes into an isolated queen pawn setup that I have a high chance of screwing up once off database.

Avatar of Aarnos

True percentage is very important in positions which have been reached in 1-100 games before. Here you have to consider many different candidate moves and if the percents are correct then you can save a lot of time(for example you can look at the better scoring moves first or prune some badly scoring ones). If you cannot trust the percentages you have to look at every move in random order. It's mainly a question of efficiency.