How We Built A Puzzle Database With Half A Million Puzzles
What makes a good puzzle? How are Chess.com's 500k+ puzzles generated? And what's the relationship between your Elo rating and your puzzle rating?
To celebrate Puzzle Week 2023, some of Chess.com's top engineering minds helped answer some of your most burning questions about all things puzzle-related. The results can be seen below!
[Answers courtesy of Roland Walker, Dodge Coates, and Ethan Metzger.]
Where did our original puzzle database come from?
Our database of puzzles has always been largely derived from real games. Classical games are in the mix, but by far the largest contributor is live gameplay by members on Chess.com. Our algorithm “walks” the positions in all the games played until it finds a position that can be thought of as a tactical puzzle. We started adding puzzles from member games in July 2007.
There are also some composed puzzles, but they only make up a small fraction of the total. At the beginning of 2023, we are currently standing at over 570,000 puzzles.
How did we get to that number?
In 2018 we suffered a “ratings crash” and decided to revamp and expand puzzles, in part to fix the low ratings, and in part to prepare for the launch of Puzzle Rush. In November of that year, we had about 58,000 puzzles. We started by taking a hard look at quality, delisted many puzzles, and were left with about 49,000. We then started a push for both quality and quantity.
How many puzzles do we add a day?
About 900, with plans to potentially increase that amount.
What makes a good puzzle?
For generating puzzles programmatically, there are three main conditions:
- There should be only one good move for each player move (except for the last move)
- The opponent's move played should not be terrible
- The winningness of the final position of the puzzle should be “obvious”
The trick is getting that last part accurately and the first two parts efficiently!
How do we rate puzzles?
We use a mix of machine learning and behavioral data. The important thing about our system is that we try to minimize mismatches for the user, and when a mismatch in rating might occur, we arrange it so that the user cannot lose rating points.
How should members think about their puzzle ratings?
If your puzzle rating is 1400, you ought to be able to solve a 1400-rated puzzle 50% of the time. But puzzle ratings are separate from gameplay ratings, and have drifted up from gameplay ratings over time. This is something we might address in the future.
How are puzzles broken down by theme?
By a mix of automated technology and user votes. Some themes can still only be obtained by voting, but that is a minority. The automated theming has been greatly improved, and the improvements will roll out in the data in 2023.
Did Puzzle Rush change how we built our puzzle database?
Absolutely! We lowered the minimum rating, completely changed the rating technology, and also changed the tech to “steer” which puzzles were added to the database, all in the name of providing a good spread of puzzles in every rating, with a smooth upward trend for the Puzzle Rush experience.
What happens when I report a bad puzzle?
When a puzzle accrues a certain number of downvotes, it is reviewed by a panel of titled players and puzzle experts. Bad puzzles are then delisted so that they will never be served to members again.
Are there enough puzzles in my own rating range?
Yes. We never stop working on this. There are at least 8,000 puzzles in every rating band up to 2400, with more being added every day.
We also work on the balance of checkmate and material puzzles and are always tuning the data to improve the member experience.
Try some of the puzzles below, taken from real-life games played by GM Magnus Carlsen. One of them is rated at 342, another at 724, and the third is rated at 1926... Can you guess which is which?
Puzzle 1:
Puzzle 2:
Puzzle 3:
We hope you enjoyed this discussion about puzzles. Let us know your thoughts about puzzles and Puzzle Rush in the comments below!