Databases: Tips, Tricks, and Softwares

Jun 2, 2014, 11:08 AM 14

Note: Software recommendations are at the bottom. Also, all images are also puzzles. Have fun!

A topic that regularly comes up with my students is the use of databases. Databases are obviously one of the great resources available to a chess player today, but many players I know make little (or poor) use of databases. Many people (most recently Nakamura) have said that Kasparov's great strength was opening preparation. While I would argue he was also a superior player in other regards, a part of Kasparov's success can certainly be attributed to his rapid embrace of chess technology such as chess databases. Prior to chess databases, chess players would have to lug around plentiful chess books and thick notebooks full of personal analysis. At one point, players were much more independently responsible for crafting their own databases. A better database was a critical piece of intellectual property which conferred a serious advantage to the possessing player. In End Game, Dominic Lawson talks about the importance of databases and computers in the 1993 match between Kasparov and Short. Short's second, GM Lubomir "Lubosh" Ftacnik (if memory serves), was selected partly for his database and software skills. These days, all chess players have plentiful access to databases so it is more important to know how to get the most out of your database than to have the better database.


Kasparov vs. Short in 1993

Despite the prevalence of chess databases in modern chess play, there is little writing that I am aware of on the topic. GM John Nunn addresses the topic in his heartily recommended book "Secrets of Practical Chess" and IM Kongsted has a section in his interesting, but rather dated, book "How to Use Computers to Win at Chess".

A database is simply a collection of chess games or possibly chess positions. In chess, databases have a few primary functions.

  1. Databases are used to review statistics on specific opening moves. What percentage of the time is e4 played vs. d4? How does each move fare? What was the last year in which this move was played? What is White's average performance rating in this line? Is that better or worse than White's average actual rating? This sort of information is presented in a "tree" which is discussed below.
  2. Databases are used to look up specific games. Sometimes someone may want to locate a historical game or all the World Championship games or something similar. More often, databases are used to locate the games played by a specific individual. If your opponent's games are in the database, it can be a large advantage to know what opening lines he plays and to prepare for them. Most of us aren't at the elite level where novelties are prepared to catch a player out 20 moves into a previously played line; however, it can still be very useful to show up far a game knowing that the opponent plays the Dragon Sicilian as Black. If nothing else, we feel a little more mentally prepared and at ease. Having at least reviewed the basic dragon, we can save a few minutes on uncertainty in the opening.
  3. For individuals, databases are also used to store one's own games, game collections, and analysis. I won't really discuss this use here.

These days, there is an unfortunate tendency to consider the top database line or lines as gospel. It is critical to realize that databases don't tell you which move is "correct". Databases only tell you which moves been played. The moves in the database were ultimately played by flawed humans (mostly). It is perfectly possible for a blunder to have been played five time, but caught only once. Alternatively, a strong move may have been played in five games and given an advantage to White only for the five White players to follow up poorly and lose or draw the games.

There's alot of data that can be gleaned from the statistics in the database. There's also alot of mistaken interpretations out there. Consider the following database tree from the Schlieman Defense to the Ruy Lopez. There a number of statistical aspects we can focus on here.


Database tree for the Schlieman Defense - NOT a puzzle Wink

  • Sample Size - According to the tree, White's "best" move is Qe2, however, there are a few causes for concern. The primary issue is that there are only 26 games. That's not a tiny sample size, but it's far too small too draw any serious conclusions. Another issue is...
  • Year Played - Lines come and go. In the previous example, the average year Qe2 was played was 1972. That's far older than the other moves. The Schlieman Defense has made massive theoretical strides in recent years. Games from 1972 should be considered of secondary importance in assessing the modern viability of the line. Another issue to watch for in the dates is refutations. A line might be played 20 times with a 60% score, but then Black could uncork a devastating novelty which reverses the assessment of the position. The line may not have been played since the novelty. Consequently, the statistics of the line will still be good, but in fact, the line is no longer viable.
  • Rating - It is important to consider the strength of the players in the assessment. If a line is constantly ventured by 2200 players against 2400 players, the statistics will always be bad. This actually happens fairly often with some lesser played lines. They may be more popular on lower levels, and consequently, the proponents of the line are often outrated. Alternatively, some trendy lines may be rapidly taken up by well informed higher rated players. This can lead to a big rating advantage for proponents of those lines. 

When deciding which database to use, there are some tradeoffs to consider.

  • Size/Quality/Speed - It is easy to think that bigger is better for databases. 10,000,000 games is better than 1,000,000, right?! In some cases yes, but larger databases require more hard drive space and more processing power. It may often be preferable to use a database with fewer games to be more efficient. Also, smaller databases are usually excluded to stronger players. 1,000,000 IM/GM games may be preferable to 10,000,000 games if half of those are played by players below 2200.
  • Offline/Online - Online databases are new and readily available. I suggest some below, but there are also plenty that I did not include. Online databases have alot of advantages. Despite the size, they don't task your computer. All the storage and processing are handled server side. Online databases are often free, and they are also more user-friendly. On the other hand, offline databases have many advantages as well. They have far more functions. You can filter and customize the databases to a greater extent. You will more easily be able to play through and analyze the games. Offline databases also provide better statistical information about the games and players.

Offline Databases and Softwares

SCID/SCID vs. PC - SCID and it's fork SCID vs. PC are great FREE programs which will handle all your database needs. I use SCID vs. PC regularly, and I have crafted some brief tutorials on some of the most basic functions I encourage my students to use. In my opinion, the only concern with SCID vs. PC is that it isn't as graphically polished as its competitors. Otherwise, all the functions are there, it runs on all operating systems, and it fast and efficient. 



Opening Master - It has been sometime since I used OpeningMaster, but it seems to be coming along very well. OpeningMaster's largest database has 8.7 million games including 1.3 million correspondence games. A good correspondence game database is particularly useful in certain extremely sharp opening lines such as the Poisoned Pawn Variation in the Najdorf. Some of these variations have proved true battlefields in correspondence chess and theory extends much deeper than the OTB theory. OpeningMaster has multiple tiers of service, and includes an automatic update model which utilizes SugarSync. The priciest tier is the platinum tier which is 59.90.

ICOFY - Ingo's Chess Offer for You is a FREE database maintained by Ingo Schwarz. The database has over 5 million games and is regularly updated. Ingo also supports multiple formats so it is easily used in multiple database programs. I highly recommend ICOFY. Although the program is free, donations are accepted. Currently, all proceeds are passed on to the German Fibromyalgia Association.

MegaBase and ChessBase - ChessBase 12 is ChessBase's database software for managing databases. ChessBase is inarguably the gold standard in chess database software. It has all the functions you might want. ChessBase is priced at €99.90. MegaBase is ChessBase's companion database which has 5.8 million games and 67,000 analyzed games. Priced at a less than meager €159.90 Money Mouth I no longer have a license for ChessBase or MegaBase so I cannot confidently review them, but there are exemplary youtube videos on the products pages which I have linked.

Online Databases

Chess.com Game Explorer - Obviously, the Chess.com Game Explorer is a handy resource for those who have platinum or diamond memberships. The Game Explorer provides Win, Loss, and Draw statistics for the given moves, but doesn't provide other statistics. You can also use the search to try to find specific games, but I found this to be iffy. For instance, I tried to find games that Carlsen had played in the Alekhine Defense, and it wouldn't accept an empty search term for Player 1. There isn't alot of information or help available for the database. It seems to have about 1.5 million games, but beyond that I am clueless. Personally, I really appreciate having the explorer easily accessible for "Daily" chess, but I use offline systems when I really need it.



365Chess - 365Chess claims the biggest opening databse online. They have around 3.5 million games which does validate their claim. They have free and premium models. Premium features include abilities to create notes, to download the games, and to create your own databases.



ChessTempo - ChessTempo's free database contains some nice features. Extra features include quick filtering by rating, and average, performance, and maximum ratings for the moves played in the database. ChessTempo's database seems similar in size to Chess.com's game explorer with about 1.5 million games.



Online ChessBase Database - The online ChessBase database was only made available this year. It is unquestionably the gold standard for online databases with over 6.5 million games, and far more features and a a better interface than any other online databases. Features include providing the last year played and the average ELO in the statistics, a good interface for playing through the games, and extremely quick and accurate searching. For instance, just type "Carlsen Radjabov" to pull up all the games between the players. My only gripes with the ChessBase offering are associated with ChessBase itself. I don't trust them to support the database for free indefinitely.

ChessGames.com - ChessGames.com is rather a unique service. To an extent, I think of ChessGames.com as Wikipedia for chess games. ChessGames.com only includes an opening explorer as a premium feature so it's a very poor choice for opening work, but it is an excellent database for research on games, players, and tournaments. It is much easier to find individual games than in the other services mentioned. My favorite thing about ChessGames.com is that everything is so well linked. Say that you are looking at a game between Shirov and Eljanov and you would like to know more about Eljanov. You can click his name above the board, and you are immediately taken to his page which has an excellent biography, a picture, and a selection of notable games and tournaments. You can also click the listed games and tournaments to pull up their pages. I find that, like with Wikipedia, it is very easy to get lost on ChessGames.com by following the links through from interesting game to interesting player and back again!



Mobile Databases

Chessbase - As might be expected, ChessBase's android offering is clearly the premier offering on mobile. The prices have been increasing, on android, the most recent version is priced at $10.49. The app has a number of features, but provides mobile access to around 6.5 million games. I assume that the database is the same database used in the online ChessBase Database. A limitation is that the database is only available when one has internet access. Probably, that is true for most people most of the time, but for the chess traveler, it might be nice to have a mobile database that is not dependent on a speedy data connection.



Scid on the go - Scid on the go is based partially on the open source code from SCID. The app is available for FREE on android, and provides many excellent features including the ability to access engine analysis. With Scid on the go, you can work with any databases you have created. If you load in a database, you needn't have internet access to access your database on the go. A major limitation of Scid on the go is the lack of an opening tree. Several have called for such a feature, and you can add your voice to the throng here.



I am sure there are many other excellent databases out there, and I am also quite sure I am missing many interesting tips and tricks. Any suggestions? What do you use and why? Smile

