Is there a database with all finished games in chess.com history?

Sort:
dying_sphynx
Martin_Stahl wrote:

They have a lot fewer games, think at least one or two orders if magnitude smaller.

Currently, I can see that there are 4.97 billion games in Lichess database (available via Tools -> Analysis Board and enabling Opening Explorer), so that's not really true that it's "orders of magnitude smaller", it's roughly 7-8 times smaller. I don't believe that this makes a significant difference (as in it's possible to do it for 5 bn games and impossible for 50 bn games).

Given that chess.com is a commercial service and as a result they have more resources, it looks surprising to me that there is no feature like that. It's extremely useful for building opening repertoires tailored for a particular rating range, I use it on Lichess almost daily.

justbefair
dying_sphynx wrote:
Martin_Stahl wrote:

They have a lot fewer games, think at least one or two orders if magnitude smaller.

Currently, I can see that there are 4.97 billion games in Lichess database (available via Tools -> Analysis Board and enabling Opening Explorer), so that's not really true that it's "orders of magnitude smaller", it's roughly 7-8 times smaller. I don't believe that this makes a significant difference (as in it's possible to do it for 5 bn games and impossible for 50 bn games).

Given that chess.com is a commercial service and as a result they have more resources, it looks surprising to me that there is no feature like that. It's extremely useful for building opening repertoires tailored for a particular rating range, I use it on Lichess almost daily.

But why would you use it almost daily? How is it useful to you?

Martin_Stahl
dying_sphynx wrote:
Martin_Stahl wrote:

They have a lot fewer games, think at least one or two orders if magnitude smaller.

Currently, I can see that there are 4.97 billion games in Lichess database (available via Tools -> Analysis Board and enabling Opening Explorer), so that's not really true that it's "orders of magnitude smaller", it's roughly 7-8 times smaller. I don't believe that this makes a significant difference (as in it's possible to do it for 5 bn games and impossible for 50 bn games).

Given that chess.com is a commercial service and as a result they have more resources, it looks surprising to me that there is no feature like that. It's extremely useful for building opening repertoires tailored for a particular rating range, I use it on Lichess almost daily.

I believe the site has been hitting close to a billion games a month (north of 600 million a month) and has been for a while. Yes, the site is bigger with more resources, but providing access to those would use significantly more resources as well, especially since a lot of people would be interested in the data.

dying_sphynx
justbefair wrote:

But why would you use it almost daily? How is it useful to you?

Whenever I see a new move or variation tried against me (or the old one which I just forgot) I am interested in seeing how different replies to that move performed against opponents of my level. So on Lichess I can go to the Opening explorer, filter the games down to ones played within my rating range (as opposed to Master games) and then I can see how particular moves perform (i.e. what's their winning rate and how often they are played). Sometimes a move that may not be the strongest according to the engine or a book (or one of the many options) may be quite tough at my level because it presents certain practical problems that people don't know how to solve. Then it's worth investigating.

There are the whole chess start-ups based on this idea for repertoire building, see ChessMadra for example. Also see this blog post from Nate Solon about this approach: https://zwischenzug.substack.com/p/the-bootleg-spacebar

Of course, one needs to understand ideas behind moves too, and not just play blindly whatever has a high winning rate, but that is a good starting point in one's investigation.

dying_sphynx
Martin_Stahl wrote:

I believe the site has been hitting close to a billion games a month (north of 600 million a month) and has been for a while. Yes, the site is bigger with more resources, but providing access to those would use significantly more resources as well, especially since a lot of people would be interested in the data.

I don't fully understand your position on it: are you saying that it's impossible to provide this chess database for *all* games? (They can restrict it, and do it for the last 50 billion, or just rapid games, or whatever. Also it's possible to scale stuff, here we are not talking about some exponential growth).

Or that it's not useful for the users and therefore not worth having it on the site? (I disagree)

Or that it's just too wasteful for chess.com programmers/resources? (It's hard to judge on that, I don't know what their priorities are, but I hope that providing useful features is close to the top!)

Martin_Stahl

I'm saying that it would use a lot of resources; storage and bandwidth likely top that list, but staff resources would be needed (in designing a system to do it and maintaining it)

Probably the biggest would be the number of downloads of any DB exports and the associated cost to serve those up to meet the demand.

dying_sphynx
Martin_Stahl wrote:

Probably the biggest would be the number of downloads of any DB exports and the associated cost to serve those up to meet the demand.

There is no need to provide it as a downloadable file. I meant just exposing it as Opening Explorer on Lichess.

Martin_Stahl
dying_sphynx wrote:
Martin_Stahl wrote:

Probably the biggest would be the number of downloads of any DB exports and the associated cost to serve those up to meet the demand.

There is no need to provide it as a downloadable file. I meant just exposing it as Opening Explorer on Lichess.

The size and performance of an Explorer database of that size would be problematic as well. I know there some work around game storage to make them easier to work with, but I don't believe they're going to be expanding it to all games.

dying_sphynx

So how about limiting to 5 billion of games and using Lichess approach? Their UI is very responsive and I haven't experienced any issues with that happy.png

dying_sphynx

Anyway, there is not much point in arguing about that, of course this feature is unlikely to be implemented any time soon. I can just use Lichess Opening Explorer for the moment being, it's good enough for my purpose.

LateToMate
Out of curiosity, are there published rules for which games are included in the existing chess.com Opening Explorer? I have noticed that, even when looking up a specific player, sometimes only a small subset of their games are available.
tlay80
dying_sphynx wrote:

So how about limiting to 5 billion of games and using Lichess approach? Their UI is very responsive and I haven't experienced any issues with that

I'd be interested in this too. Even if you limited it to rapid and daily games, on the theory that they're more meaningful than blitz and bullet, that could be quite useful.

Martin_Stahl
LateToMate wrote:
Out of curiosity, are there published rules for which games are included in the existing chess.com Opening Explorer? I have noticed that, even when looking up a specific player, sometimes only a small subset of their games are available.

I'm not sure how accurate this is anymore but https://support.chess.com/article/368-what-is-the-game-explorer

  • Currently, the Explorer does not index Live Chess games shorter than 5 | 0, Chess960 games, or any games of fewer than four moves in length.

I believe there may also be some issues around the processing of game for the Explorer databases. The project I mentioned earlier around game storage will likely fix issues seen with missing games in the tool.

LateToMate
Martin_Stahl wrote:
LateToMate wrote:
Out of curiosity, are there published rules for which games are included in the existing chess.com Opening Explorer? I have noticed that, even when looking up a specific player, sometimes only a small subset of their games are available.

I'm not sure how accurate this is anymore but https://support.chess.com/article/368-what-is-the-game-explorer

  • Currently, the Explorer does not index Live Chess games shorter than 5 | 0, Chess960 games, or any games of fewer than four moves in length.

I believe there may also be some issues around the processing of game for the Explorer databases. The project I mentioned earlier around game storage will likely fix issues seen with missing games in the tool.

Thank you!

AlexeyChess

Several comments:

  1. When you see any chess.com stats about games played keep in mind that about half of the games played are human vs computer(bot), if the number is big, it’s very possible those games are included. Why it’s important - for many people only human vs human games have value.
  2. Lichess platform being much smaller is big enough to catch trends in online chess, having stats for both platforms at one time point provides a way to extrapolate. For example, public statement Feb-2023: “In total, there were an astonishing 1,057,320,754 games played on Chess.com. Of those, 576,946,832 were live games, of which 3,181,513 were daily games, and 480,373,922 were against the computer.” Feb-2023, 580M between humans on chess.com, 98.5M on lichess, coefficient is about 5.9
  3. Lichess for opening explore uses positions, not games. Such transformation may seem crazy as the number of positions is much larger, but it is compensated by ridiculously fast modern key-value storages. They use rocksdb which works on petabytes scale at Facebook, so even x10 chess.com compared to lichess is not an obstacle. There is also a deep wisdom in position centric approach, if we think about common use-cases, most of the people will actively explore this tree close to the beginning, where position approach is extremely effective.
dying_sphynx
AlexeyChess wrote:

Lichess for opening explore uses positions, not games. Such transformation may seem crazy as the number of positions is much larger, but it is compensated by ridiculously fast modern key-value storages.

All interesting points, thank you! I didn't realise that games played against computers represent such a considerable share.

Regarding games vs. positions in the Explorer -- could you please elaborate? To me, any state in the Opening Explorer represents a particular position, i.e. a node in a graph where certain moves may lead to other nodes/positions, and those moves are annotated with winning ratios and number of games played. Basically, if we use a graph model, a position is a node and a move is an edge of the graph.

So I am not sure how it can be the other way around, i.e. how it can use *games* and not positions?

Perhaps you mean that they use positions to build the actual graph, i.e. first they generate and store all positions from the games, and then use those positions and moves between them to build the graph (Opening Tree).

AlexeyChess

Yes, I mean the underlying data structure.

Let’s say we have 10 bln games in total, we can store them game-wise, it will lead to 10 bln rows table, or we can store them position wise which will lead to 800 bln rows (40 moves on average, 80 half-moves), which looks crazy, but provide very natural structure for game explorer needs. If your main use cases are to show my own games or games of a particular player, game-wise approach is very good, but when you need game explorer over all games played on the platform, game-wise approach will lead to enormous computations.

I don’t know how this exactly inside chess.com, my guess based on posts of people related to chess.com, noting some computational issues, which can only be true if they went in a different way.