Ubuntu Linux and Chess -- Databases and Analysis

Apr 2, 2012, 1:43 AM |

In this post I'm going to talk a bit more about scid, its native database format, and some of the things that can be done with it.  I'll start with a few basic concepts.

I mentioned a few blog posts ago about the PGN file format.  It's a file format that you'll see in common use on many chess sites and has become a defacto standard for storing chess games.

The PGN file format is not limited to storing just one chess game, though.  It can store many chess games from a wide range of sources.  Basically a PGN file of more than one chess game is just the same as each chess game in a single file, all concatenated together.  The more Linux savvy readers will already know that there is a simple command line utility in Linux to concatenate two files together, called "cat".

So the first step before we begin game analysis is to assemble one or more PGN files of all of the games you want to analyse.  You can, if you prefer, collate a PGN file of all of your games on chess.com (chess.com has a utility to email you a PGN file containing selected games from the online chess window), or you can search for a database of chess games on the internet.

There are many such databases, some of which are listed here:


For example, the following link contains a database of chess games with more than 4.7 million games, sorted by ECO-code, which are free to download:


Of course you can also purchase chess databases, as well as download regular updates to your current databases from sites such as TWIC:


I won't go into a review of every possible chess database, just to say that there are some large ones out there and you can sort through the ones you are after.  For example, there are many databases which just contain games sorted by country, so if you want to see what openings are popular amongst your fellow countrymen you can use one of those.

Once you have collated a database, it will be one large PGN file.  If you have many small PGN files you can treat them as separate databases, or concatenate them together using cat:

cat one.pgn two.pgn three.pgn > big.pgn

You need to use a command line program called pgnscid to convert that into scid's native database format.  pgnscid works simply like this:

pgnscid big.pgn

pgnscid will create 3 files, with extensions sg4, si4 and sn4.  These are the database files and the indexes which allow scid to find games faster.

The next thing to do is start scid, and use the menu option File -> Open base as tree to load your database.

The file you are looking for is the one with the .si4 extension.  Note that you can actually load a PGN file directly into scid using this menu option but it's not nearly as fast and efficient as loading the native scid database.

Now you will see a new tab in scid's list of tabs on the right hand side of the scid main menu -- the Tree Window.  You can see, in that window, the list of possibilites for the game's next move, the frequency in which it occurs in the database, the ECO code if present, and the score.  The score for a move is the percentage of wins for white with that move, with a draw counting as 50% of a win -- on that basis white should have an average score of 53.8% (according to some statistic I found in the scid help file -- I have no idea how accurate that is), and so a move with a higher score than that is good for white (according to the database), and a move with a score that is lower than that is good for black.

You can click on one of the moves in that tree window to advance the game one move along the step for that move.  So, in one of my big databases I can see "e4" as the most popular move at the game start, and since I'm an e4 player anyway I can click on the e4 and the game will step to the next set of possible moves (for black).

So I'll continue this example, I'll continue through an opening I see quite a lot, since I'm a Lopez player as white, which is the Closed defence to the Ruy Lopez.  Each move I make is shown on the list to the right and each of black's possible options is also shown as per the database I have loaded.  I step through the various options and find myself in the Zaitsev variation (C92t if you want to look up the ECO code) and look at the various options.  One move I hadn't considered is 14. Nxd4 (I would normally take with the c pawn to retain the pawn center) but I can see that in my database so I'll go with it.

The next step is to turn the analysis engine on -- in this case I use stockfish with the opening book.  I mentioned setting this up a few blog posts ago, so if you're not sure how to do this check back here:


Now I turn on analysis mode by starting the engine through the scid menu: Tools -> Analysis Engine, and let it run.

Fairly quickly I find that Nxd4 is a reasonable reply to that, and although stockfish gives the position a value of +0.24 (meaning slightly favourable to white -- a position value of +1.0 means white is ahead by 1 pawn), I could probably have done better with cxd4.

The combination of having an engine, an opening book file, and a game database as a study tool is a good way towards playing strong chess, and scid wraps that up into a nice interface, all for free.