Forums

World largest offline chess database for free.

codekiddy

Hello guys!

I've been looking for largest chess games database for some time all over the internet, and there are people who made such databases by collecting games.

Most of these free databases are not exceeding more than 3 Million games, which is of course a lot, but I wanted larger one, and looked up for commercial solutions, and found some interesting sites offering chess databases of various sizes, such as 5 Million games or even 10 Million or more games.

However these commercial solutions cost money, and that's a big problem for people who simply can't or are not willing to pay, not only that but some websites don't even sell their databases, but instead offer online query only, therefore you have no control.

So I come up with an idea to create my own FREE largest database of games that will beat all other databases found on internet!

So I started collecting OTB games played by humans (not engines, not correspondence etc.) starting from year cca 1500 up until now, that is OTB games for last 500+ years.

I managed to collect over 30 Million games so far by downloading other people databases and PGN files from various websites, merged them all into one big database and started removing duplicate games.

So far this number reduced to some 20 Million games and still reducing...

I also created a project page on sourceforge to share my super-duper-jambo database with other chess players around the globe for FREE!

I've been using Scid vs PC program to manage database and therefore the database is in scid format, meaning you can download it and do with it what ever you want, such as importing more games.

Please note that this project is currently still in progress and I'm seeking for help from other chess players to collect more games and to help maintain this database!

The plan is to make the database available to everyone and more important the more people there are willing to contribute the larger database we will have, for FREE!

I'm still not sure how could all this work, so if you're interested please either give a reply here or join discussion forum on my project page to share your ideas on how to proceed with this project.

 

If you're a chess novice you might ask your self:

Q: Why on the earth would anyone need a chess games database?

A: You can use database to do following things:

1. Study games played by chess masters last 500 years.

2. Study openings, midgames and endgames.

3. Create high quality opening books for use by chess engines

4. Search for games played in chess history.

5. play chess against the database (ie. to learn opening variations)

6. Create opening books for specific opening or variation and use chess engine to play that spedific opening.

7. Solve puzzles and much much more!

 

My project page:

https://sourceforge.net/projects/codekiddy-chess/

Discusion forums:

https://sourceforge.net/p/codekiddy-chess/discussion

Please note that the current database download is password protected and contains duplicate games (waste of bandwidth), I will soon upload a new updated version which will be publicaly available.

 

Thank you so much for attention!

~codekiddy

 
notmtwain
codekiddy wrote:

Hello guys!

I've been looking for largest chess games database for some time all over the internet, and there are people who made such databases by collecting games.

Most of these free databases are not exceeding more than 3 Million games, which is of course a lot, but I wanted larger one, and looked up for commercial solutions, and found some interesting sites offering chess databases of various sizes, such as 5 Million games or even 10 Million or more games.

However these commercial solutions cost money, and that's a big problem for people who simply can't or are not willing to pay, not only that but some websites don't even sell their databases, but instead offer online query only, therefore you have no control.

So I come up with an idea to create my own FREE largest database of games that will beat all other databases found on internet!

So I started collecting OTB games played by humans (not engines, not correspondence etc.) starting from year cca 1500 up until now, that is OTB games for last 500+ years.

I managed to collect over 30 Million games so far by downloading other people databases and PGN files from various websites, merged them all into one big database and started removing duplicate games.

So far this number reduced to some 20 Million games and still reducing...

I also created a project page on sourceforge to share my super-duper-jambo database with other chess players around the globe for FREE!

I've been using Scid vs PC program to manage database and therefore the database is in scid format, meaning you can download it and do with it what ever you want, such as importing more games.

Please note that this project is currently still in progress and I'm seeking for help from other chess players to collect more games and to help maintain this database!

The plan is to make the database available to everyone and more important the more people there are willing to contribute the larger database we will have, for FREE!

I'm still not sure how could all this work, so if you're interested please either give a reply here or join discussion forum on my project page to share your ideas on how to proceed with this project.

 

If you're a chess novice you might ask your self:

Q: Why on the earth would anyone need a chess games database?

A: You can use database to do following things:

1. Study games played by chess masters last 500 years.

2. Study openings, midgames and endgames.

3. Create high quality opening books for use by chess engines

4. Search for games played in chess history.

5. play chess against the database (ie. to learn opening variations)

6. Create opening books for specific opening or variation and use chess engine to play that spedific opening.

7. Solve puzzles and much much more!

 

My project page:

https://sourceforge.net/projects/codekiddy-chess/

Discusion forums:

https://sourceforge.net/p/codekiddy-chess/discussion

Please note that the current database download is password protected and contains duplicate games (waste of bandwidth), I will soon upload a new updated version which will be publicaly available.

 

Thank you so much for attention!

~codekiddy

 

I visited the site and it looks like you just started this project within the last couple of days.

Let me ask a novice question. I thought that modern chess engines like Stockfish weren't using opening books any more. Is that wrong?

codekiddy
notmtwain wrote

Let me ask a novice question. I thought that modern chess engines like Stockfish weren't using opening books any more. Is that wrong?

Hi notmtwain,

Not entaerly true, there are engines which do use opening books.

As for stockfish 6 engine they say it's up to GUI to provide an opening book to the engine, as discussed on bellow link:

http://support.stockfishchess.org/discussions/questions/1709-stockfish-6-book-file

EscherehcsE
Mahdibaghbani1998 wrote:

Hi, tnx for your great project, I use chessbase for storing and analyzing my games and learning, I also have fritz 15 and fritz11SE; how can I use your database instead of Fritz's main database? (Fritz has 600k games)

I guess you'll have to use Scid or Scid vs. PC to export the games to a PGN file, then import the PGN file into Chessbase. Working with large PGN files will be terribly slow though; You might want to export the database in chunks.

EscherehcsE

Well, it only took about 5 minutes to export the 2.5 million-game, high quality database to a PGN file; That's not too bad.

However, I'm not sure that's any better than what you can already get from Ed Schroeder's MillionBase 2.5 database (also about 2.5 million games), which is already available in Chessbase format:

http://rebel13.nl/rebel13/rebel%2013.html

EscherehcsE
Mahdibaghbani1998 wrote:

I must replace this database with Fritz's main database? Or just use it besides Fritz's database?

It's been a long time since I've used Fritz, but I guess you can do it either way. Either close the Fritz main database and open the MillionBase database, or import the MillionBase database into your Fritz database. However, if you do the latter, I'm sure you'll end up with duplicate games that you'll have to delete. (Make sure you back up your Fritz database before you start experimenting with merging the two databases.)

adrk

what is the password for unzipping

torrubirubi

This looks like a huge project. What is your financial return for investing so much time in such a project? I am just asking, as I surely would use your free database.

torrubirubi

 Okay, I see now, with some advertisements, this is okay.

OpeningMaster
codekiddy napísal:

Hello guys!

I've been looking for largest chess games database for some time all over the internet, and there are people who made such databases by collecting games.

Most of these free databases are not exceeding more than 3 Million games, which is of course a lot, but I wanted larger one, and looked up for commercial solutions, and found some interesting sites offering chess databases of various sizes, such as 5 Million games or even 10 Million or more games.

However these commercial solutions cost money, and that's a big problem for people who simply can't or are not willing to pay, not only that but some websites don't even sell their databases, but instead offer online query only, therefore you have no control.

So I come up with an idea to create my own FREE largest database of games that will beat all other databases found on internet!

So I started collecting OTB games played by humans (not engines, not correspondence etc.) starting from year cca 1500 up until now, that is OTB games for last 500+ years.

I managed to collect over 30 Million games so far by downloading other people databases and PGN files from various websites, merged them all into one big database and started removing duplicate games.

So far this number reduced to some 20 Million games and still reducing...

I also created a project page on sourceforge to share my super-duper-jambo database with other chess players around the globe for FREE!

I've been using Scid vs PC program to manage database and therefore the database is in scid format, meaning you can download it and do with it what ever you want, such as importing more games.

Please note that this project is currently still in progress and I'm seeking for help from other chess players to collect more games and to help maintain this database!

The plan is to make the database available to everyone and more important the more people there are willing to contribute the larger database we will have, for FREE!

I'm still not sure how could all this work, so if you're interested please either give a reply here or join discussion forum on my project page to share your ideas on how to proceed with this project.

 

If you're a chess novice you might ask your self:

Q: Why on the earth would anyone need a chess games database?

A: You can use database to do following things:

1. Study games played by chess masters last 500 years.

2. Study openings, midgames and endgames.

3. Create high quality opening books for use by chess engines

4. Search for games played in chess history.

5. play chess against the database (ie. to learn opening variations)

6. Create opening books for specific opening or variation and use chess engine to play that spedific opening.

7. Solve puzzles and much much more!

 

My project page:

https://sourceforge.net/projects/codekiddy-chess/

Discusion forums:

https://sourceforge.net/p/codekiddy-chess/discussion

Please note that the current database download is password protected and contains duplicate games (waste of bandwidth), I will soon upload a new updated version which will be publicaly available.

 

Thank you so much for attention!

~codekiddy

 

 

Hi codekiddy,

nice work. I would like to ask you how you managed to collect so many games at once. Quoting your statement: I managed to collect over 30 Million games so far by downloading other people databases and PGN files from various websites...

This is actually admitting you have taken somebody's protected copyrighted work for free. You do understand that although individual chess game is by international copyright laws free however a collection of games or so called chess database is a protected material and cannot be copied as whole or partially without a permission of the owner. We (openingmaster dot com) have been collecting human chess games since 2004 manually through paid sources and tournament publishers and offer our products ever since for commercial usage (59 EUR / year / subscription). We started our collection of human games somewhere in 3 mil and now have OM GOLEM offering 26 mil human games (no computer games), OM OTB offering 8,4 million human over the board games and OM CORR offering 1,7 million human correspondence games without duplicates. Therefore, I am just a bit surprised you decided to copy all (everywhere) and offer it on your website or under Source Forge. As much as I would love to have free lunch for all the world, we invest resources on monthly basis to collect and publish the games. Please read our terms and conditions and provide official statement or feedback. 

Best regards,

Alexander Horvath

founder of Opening Master

The Chess Database company

EscherehcsE
openingmasterreal wrote:
codekiddy napísal:

Hi codekiddy,

nice work. I would like to ask you how you managed to collect so many games at once. Quoting your statement: I managed to collect over 30 Million games so far by downloading other people databases and PGN files from various websites...

This is actually admitting you have taken somebody's protected copyrighted work for free. You do understand that although individual chess game is by international copyright laws free however a collection of games or so called chess database is a protected material and cannot be copied as whole or partially without a permission of the owner. We (openingmaster dot com) have been collecting human chess games since 2004 manually through paid sources and tournament publishers and offer our products ever since for commercial usage (59 EUR / year / subscription). We started our collection of human games somewhere in 3 mil and now have OM GOLEM offering 26 mil human games (no computer games), OM OTB offering 8,4 million human over the board games and OM CORR offering 1,7 million human correspondence games without duplicates. Therefore, I am just a bit surprised you decided to copy all (everywhere) and offer it on your website or under Source Forge. As much as I would love to have free lunch for all the world, we invest resources on monthly basis to collect and publish the games. Please read our terms and conditions and provide official statement or feedback. 

Best regards,

Alexander Horvath

founder of Opening Master

The Chess Database company

Well, codekitty appears not to have done anything for over two years. (Another abandoned project, at least at this time.) Anyway, rereading the original post, I don't think the OP ever states that he took copyrighted material. Maybe he did, or maybe he didn't. Maybe he only took games from free databases.

Have you confirmed that he took copyrighted material from your databases? Do your databases contain annotations? (And I'm guessing the copyright issue is another potential can of worms...)

OpeningMaster

Well, indeed it is delicate topic. I haven’t actually accused anybody from anything specific, the activity described by the original post seemed like copying everything from everywhere. It’s easy to buy Chessbase Big or Huge or Chess Assistent Megs and add all games to your database “all collection” and call it “your project” or “your games”, but it’s actually wrong and if proven illegal too. Regarding your question, we don’t hold annotations but rather focus on statistics within the OM chess database. Indexing, De-duplicating, adding, cleaning, adding etc. Right now, we are going through the DB. It is damn difficult to prove whether it was your collection or the games are from elsewhere but we have our methods. Hope to see some feedback from codekiddy soon. Best regards, Alexander H

EscherehcsE
openingmasterreal wrote:

Well, indeed it is delicate topic. I haven’t actually accused anybody from anything specific, the activity described by the original post seemed like copying everything from everywhere. It’s easy to buy Chessbase Big or Huge or Chess Assistent Megs and add all games to your database “all collection” and call it “your project” or “your games”, but it’s actually wrong and if proven illegal too. Regarding your question, we don’t hold annotations but rather focus on statistics within the OM chess database. Indexing, De-duplicating, adding, cleaning, adding etc. Right now, we are going through the DB. It is damn difficult to prove whether it was your collection or the games are from elsewhere but we have our methods. Hope to see some feedback from codekiddy soon. Best regards, Alexander H

I hope it works out for everyone concerned. I only downloaded codekitty's DB temporarily to answer a few questions; I didn't even keep a copy. (I tend to not be very interested if there might be considerable duplicates or very low rated players.) I have downloaded a number of free DBs, but I hardly ever use them. I think I still have GorgoBase in my Scid vs. PC program, but I've never updated it, and I could probably count on one hand the number of times I've used it in the last year.

If someone needs a comprehensive and largely error-free games DB for serious study, I can see the value in what you offer.

But if I were you, I wouldn't hold my breath about codekitty commenting further - He hasn't logged in in 2-1/2 years.

OpeningMaster
MazeOfMoves napísal:

Why is a large database a good thing? Wouldn't it make more sense to have a concise collection of essential games instead?

It is statistics. If you have quality chess database (like Opening Master :-)), you will end up with much more games from particular games / openings. Once used as reference DB in program like Chessbase or SCID, the player has better % accuracy of predicting the path and score. Imagine you have DB of 1000 games to compare or 10,000,000. If you move with white or black on 14th move and this game was played 230,000 times with the follow on path, your DB should tell you your chances exactly. And believe it saying 46% chance if win or 56% makes difference. It’s like predicting the future if you like or learning while playing. 

@ EscherehcsE, you are ok, just once in a while buy commercial product DB. We are sure we have all your games and others too. 99%.

 

kennet_eriksson

You can create large databases just by finding games and collections that are free to download. Mine have just over 7M games. I don't search for computer games nor for games played on Internet servers. I believe it has few duplicates and is of a somewhat high quality.

OpeningMaster
kennet_eriksson wrote:

You can create large databases just by finding games and collections that are free to download. Mine have just over 7M games. I don't search for computer games nor for games played on Internet servers. I believe it has few duplicates and is of a somewhat high quality.

 

While I agree many games are for free on the Internet and various web site tournaments displays their PGNs for free however unless you have been doing it for 12+ years like us you won't manage to collect few hundred thousands games unless you download commercial chunk as "basis" and then add to it some minor collections found online. For example Opening Master OTB version has 8,8 million human games without duplicates. (our largest collection OM GOLEM has 24mil) Whatever we do, how much we try, we always end up adding 50K-100K games in the next release. So creating 7 millions human games DB from free games published here and there is simply mission impossible you need to have bulk from somewhere. That is just my opinion as it would be endless battle to prove which was first chicken or egg. Plus you use it for your personal usage so no harm there.

We rather focus on our work and publish regularly updates for 59 EUR / yearly which is fair commercial price I believe and focus on SELECTIVE groups which MazeofMoves mentioned. Cleaning and Indexing and de-duplicating has been a nightmare even with current hardware and software benefits.

Cheers all

Alexannder

OpeningMaster

agree here. We also publish OM 2500+  which per definition is the GM database. It’s good for learning but smaller in size. In the main database whenever we reach some mass point (e.g 10 mil in OM OTB) we move higher in ELO. Long time ago we had all including kids tournament and realized for statistics it was not relevant, so we started 1300, some year back we moved up again to 1500. Who knows by 2025 we will be at ELO 2000 or start a new AI version with ELO 4000. Who knows. Good luck in your games, chances are we have your games in if they were published somewhere.

kennet_eriksson

@openingmasterreal
Yes, I've been collecting for years. For example, I've followed TWIC a long time now.
A startpoint for collecting might be:

http://www.chessgameslinks.lars-balzer.info/

You can find more by searching for something like 'free chess database'.
Some national chess organisations and many chess clubs have games to download.
Anyone can create a large quality database if they invest time. Of course, saving time by buying a database is fine with me.

superchessmachine

Wow!

OpeningMaster

@kennet_eriksson, we know Lars, we have been working with him since the very beginning. (see our logo on his homepage) so we understand some players choose to seek free databases and some do not have so much time and rely on commercial solutions. 

Regards

Alexander