Puzzle ratings and success rates


Just noticed a 1200-rated puzzle in Puzzle Rush that has a remarkable 7.2% success rate! How does the system work that allocates ratings to puzzles and delivers puzzles to users in a way that results in such a surprisingly one-sided stat?

Here it is:



I assume people are drawn to the dark squared diagonals around the black king and maybe try something like Qh6 as an automatic to try for a quick point.

Moving the bishop back is not a natural move at all in a tactics problem unless you take the time to study the position.


My point is that roughly speaking this problem should have a rating about 350 points higher than its average users on tactics trainer, on the Elo scale. In order for that to be so, something would need to be stopping it being presented it to users in the appropriate rating range to get a normal success rate. Otherwise, its rating is just wrong in any reasonable sense.

Rating problems is not trivial, because there is an element of time as well as the results (and there is the question of whether partial success counts for anything), but none of these issues explains this sort of discrepancy, IMO.

The ideal would be something like all problems being presented to users in a range that causes it to have a normal success rate. This is compatible with players who take their time achieving a higher success rate and those who move quickly getting a lower success rate, as well as with offering problems in a wide range of difficulty to each user.


Everytime I read a post by Elroch, this tune pops up in my head.


I am humbled. I am listening to the whole soundtrack now.


Here is the list of all puzzles:


Now, sort them by the Pass Rat (lowest), try some of the least passed ones (usually some stalemate or underpromotion thing) to see that they are often really hard... and look at the actual puzzle rating.

Maybe it doesn't fully apply to the puzzle you gave, but there's a "problem" with how rating is calculated, as it takes into account the number of moves played correctly by people and their rating in a too straight way, I suppose?
I.e. even if puzzles are very hard, if the most of the weight lies in single most important move, especially the move that is not the first move, the rating will be lowered because: people did several/most of the moves.

So, people with low or medium rating passing most or part of the puzzle's moves, even if they miss the most important move and fail (lowering puzzle to below 10% pass rate, or even below 5%), will still be used to conclude the puzzle rating by their partial passes, lowering the difficulty rating of the puzzle.

For more information try reading comments from some of these puzzles with lowest pass rate.



I wonder if this particular problem is rated so low because it's a one-move answer. Recapturing the queen doesn't really count as a move that needs calculating. 

Stronger players will probably be more drawn to a mating attack on the dark squares but weaker players, who often play to win the queen rather than the king might be looking to move the bishop immediately. 

Perhaps it's one of the rare puzzles that can be "easier" for people in its rating range than for stronger players.

Not sure who it's being presented to and why it's such a low pass rate though.

The problem is all about the weak dark squares though. After 1.Bf1 the black queen is not trapped. The point is it can‘t guard well against the mate.

In The text After 1...Qxe1 Rxe1 Rxe1 Black has won two rooks for the queen ( normally good for black). But after 3.Qh6 3...Re5 is forced to stop mate and after 4. Bxe5 dxe5 Qd2! white picks up a lot of material.

Similarly, after 1...Qd5 Qf6! (Threatening mate) Re5 Rxe5! Qxd4 (dxe5 Bxe5) Re8+ picks up the queen.

After 1...Qf5 Bd3 Qd5 is similar to the previous line.

Hence the lines are quite complex and I wouldn‘t expect a 1200 to see them.

Since the rating is still exactly 1200 (the pass rate has dropped to 7.1%, so people have been doing this problem), I conclude that although 1628 people have done this problem, its rating has remained fixed at 1200 as some sort of initial default. I can't see why this would be necessary or optimal.

I agree this problem is both somewhat technical to calculate properly and relatively difficult to get right by a lucky guess (Qh6 is surely the gambler's move!)


Still rated 1200, success rate 6.4% from 1787 users! Why don't they generate a rating in a sensible way?


That problem has 5.5% from 2074 attempts now!

(Edit: 4.8% from 2422. Still rated 1200. Why??)