News

The Total Chess Library

ArnieChipmunk
| 0 | Chess Event Coverage
Piranesi - Carceri XIVBeing a database programmer, perhaps I shouldn't have been surprised when I recently dreamt I had to develop a chess database. But it wasn't an ordinary chess database.

Carceri XIV - Giovanni Battista Piranesi

I was told by a faceless person to make a chess database of all chess games ever played. If that doesn't sound like much, it's because that was not all. The man told me it must also contain all chess analyses ever made, as well as every comment, opinion or text ever written about any move. It would be a database of all existing chess knowledge -an endless chess library. It was like making the chess version of Jorge Luis Borges' Total Library. The ultimate Mega Database - an entire chess universe.

I started by collecting all existing chess books ever written - both ancient manuscripts and newly printed books. I visited all chess libraries in the world and went through all privately owned chess book collections. But this clearly wasn't enough. I had to visit every chess player in person to ask for any scoresheets of games that they had in their possession. Then, I went through all local club magazines and internet blogs to find games I missed. This reminded me that I had to get all chess magazines as well. And, of course, I downloaded all digital books, DVD's, game analyses and instruction guides on chess.

When I had rubricized all material and put it in a more or less logical order, I started thinking about how to put everything in a database. It didn't take me long to realize I wouldn't be able to use existing chess database software. It would just be too impractical. For 1.e4 alone, hundreds if not thousands of comments somehow had to be entered in the database, and this can't be done with a regular database program. While it is possible to add comments in different languages in some software, you can't add comments by different sources - at least not dynamically.

So I started thinking about how to develop this chess database myself. Basically it had to contain many more dimensions than the current ones - in fact, it had to have an infinite amount of possible entries for comments and analyses. All published praise of 47...Bh3!! and 23...Qg3!! had to be entered into the database somehow. Actually, it should also be possible to add multiple annotation symbols, because perhaps some commentators had awarded these moves not with two, but only with one exclamation mark (a grave sin, I must say). The database design must take this into account as well.

With the help of data warehouse design techniques, I was able to establish which dimensions my database should have. Obviously there should be dimensions with information about the sources (the books themselves), and information related to the games, or game fragments. This could be players' names, the year in which it was played, where it was played, and so on. The moves and sub variations (including move number, to keep track of things) should be stored in a different dimension (or, in its technical term, a 'fact table'). Any game, including its sub-lines, could develop like a garden of forking paths, leading to an infinite amount of moves.

Database

'Datavault' model of a data warehouse



The same was obviously true of comments. But there was an additional problem: comments could not only be related to moves, but also to people who had written them. in his books, Kasparov often refers to older authors, for example. At this point in my dream, my faceless principal interrupted my musings. He ordered me to also store all information about the people who had written the annotations: what use would the project otherwise be? This implied I had to include all biographies of chess commentators in my database. And of course, the commentators could also be chess players themselves, so they should also be linked back to the players and games dimensions.

When I had finished my design - or at least thought I had - a long-feared question arose in my head: where to start? Which data should be put into the database first? Would it be wise to work 'backwards' in time, starting with the most recent chess books and adding entries in the database for every name, move or comment that returned a blank? Wouldn't it be wiser to start with the first chess manuscripts - the recent reconstruction of Francesch Vicent's mysterious treatise, the surviving games of Ruy Lopez, or perhaps even the first ancient Arab chess problems?

In the end, I decided it wouldn't really matter - it was a Sisyphus job in any case - and so I started with a game collection from 2010. It happened to be a new book on Capablanca. Slowly but steadily I worked my way back. Then I realized I had forgotten something crucial. Within comments, there could also be references to other works - references to database entries that didn't exist in my digital library yet! I was suddenly faced with what is sometimes called 'orphans' - database references that can't be traced back (anymore) to their primary dimension. In order to proceed, I had to put all titles in the system first. And so I started again.

My success didn't last long. I soon found out that many chess authors use references to non-chess related literature all the time. Kasparov quotes Ilf & Petrov, Donner quotes Nietzsche. Once you start paying attention to it, chess and literature are completely intertwined. To be complete, the entire world literature should be included in the list as well. And that's only the beginning of a myriad of problems. For instance, how to deal with references to literature that has been lost over the centuries?

I now realized the entire Total Chess Library idea would be quite pointless without having access to each and every chess book ever written; every game or analysis - including those that have been destroyed, mutilated, lost for good. I was trapped in a labyrinth I had created myself.

Then I woke up, of course. While I cycled to work, I thought about what use such a megalomanic project could be. Nobody would ever be able to use this monstrous database. The information would be sitting there in some kind of super computer without anyone ever touching it. At first I felt anger, then sadness. Then I felt like nothing had really changed. It was just like work.

As I switched on my laptop at work and opened the data warehouse environment I was currently working on, I remembered the words from another Borges story, The Library of Babel:
At that time it was also hoped that a clarification of humanity's basic mysteries -- the origin of the Library and of time -- might be found. It is verisimilar that these grave mysteries could be explained in words: if the language of philosophers is not sufficient, the multiform Library will have produced the unprecedented language required, with its vocabularies and grammars.

For four centuries now men have exhausted the hexagons ... There are official searchers, inquisitors. I have seen them in the performance of their function: they always arrive extremely tired from their journeys; they speak of a broken stairway which almost killed them; they talk with the librarian of galleries and stairs; sometimes they pick up the nearest volume and leaf through it, looking for infamous words.

Obviously, no one expects to discover anything.
More from CM ArnieChipmunk
Why chess will never be popular

Why chess will never be popular

In praise of draws

In praise of draws