Forums

meaninglessness of tactics trainer stats

Sort:
chris_miner

If I can remember a problem from last time I saw it, then I'll use that knowledge to solve the problem.  Solving a tactics problem the very first time I've seen it is an entirely different situation.  I'm pretty sure including 2nd (3rd, 4th?) attempts in the stats makes them meaningless.

Each time a person sees the same puzzle they are likely to solve it more quickly.  This artificially lowers the average time and messes up the standard deviations.  Lots of problems are much easier to solve the second time you see them.

A problem's rating should be determined solely by outcome of the first time a person sees the problem.  I don't care if the problems come up again during training, but subsequent outcomes shouldn't be used to adjust either the puzzle's or the trainee's rating.

notmtwain
chris_miner wrote:

If I can remember a problem from last time I saw it, then I'll use that knowledge to solve the problem.  Solving a tactics problem the very first time I've seen it is an entirely different situation.  I'm pretty sure including 2nd (3rd, 4th?) attempts in the stats makes them meaningless.

Each time a person sees the same puzzle they are likely to solve it more quickly.  This artificially lowers the average time and messes up the standard deviations.  Lots of problems are much easier to solve the second time you see them.

A problem's rating should be determined solely by outcome of the first time a person sees the problem.  I don't care if the problems come up again during training, but subsequent outcomes shouldn't be used to adjust either the puzzle's or the trainee's rating.

There are 56,399 problems in the Tactics Trainer. How many people have seen them all 3, 4 or 5 times?

If you look at that chart I linked, which shows the ratings distribution of the problem set, more than 50% of the problems (31, 243) are rated from 1150 to 1250, which probably means that most problem ratings aren't affected by the few players who have become obsessive about Tactics Trainer and have made several hundred thousand problem attempts (because their repeated attempts would tend to be made on the higher rated problems).

baddogno

And then of course we have old codgers who can see a problem umpteen times and still can't get it right. Embarassed Laughing

notmtwain

If you look at that chart you can also see the number of attempts made on each problem. All but 11 problems have less than 24,000 attempts.

The top 11 each have 175.000+ attempts. To me, it seems likely that this means that most people try these first 11 problems and then give up.

When you have 11 million members, 175,000 attempts is not a lot. Evidently, Tactics Trainer is not very popular with most of the 11 million. They haven't even tried it.

// Of course, chess.com's counters may be inaccurate. The number of attempts seems awfully low.

chris_miner
I'm certainly not claiming that anyone has seen all the problems up to five times.  I'm claiming is that people seeing a problem more than once skews the statistics.  That's factually true.
 
Anecdotally, if you read the comments you can see some go so far as to note that they've managed the problem on their 2nd or 3rd try.  Which is what got me thinking about how that affects the average solution times and how problems / players are rated.
 
The bit about 11 milion players vs. 175k attemps for a single problem is probably misleading.  There are 386 million total attempts for the 11 million members, so that's about 35 puzzle attempts per user.
 
But if you look at the graphs tool-tips you'll see the number of players in each 100 point rating zone and those ad up to 93k.  So that's 386 million / 93k or 4150 attempts per tactics user.  There are only 56,399 puzzles.
 
Probably people only work on only a subset of those 56,399 puzzles.  For example puzzles right around their tactics rating +-50.  For example the 3528 players with a tactics rating between 350 and 450 might have about 23 k puzzles between 400 and 500.  I say might because I can only get up to page 100 in the puzzle listings.  I saw one of those puzzles had 2459 attempts.  This is more anecdotal evidence that players are seeing the same puzzle more than once.  That skews the results.
 
I don't know how many problems there are in the 1150 to 1250 range.  Probably the same number as in the 400 - 500 range.  But there you have 31243 players.  218767 (problem# 793 w/ 1173 rating) attempts spread over 31k players leads me to think there would be 2nd and 3rd attempts on this particular puzzle.  Especially as people stall in their progress.
notmtwain
chris_miner wrote:
I'm certainly not claiming that anyone has seen all the problems up to five times.  I'm claiming is that people seeing a problem more than once skews the statistics.  That's factually true.
 
Anecdotally, if you read the comments you can see some go so far as to note that they've managed the problem on their 2nd or 3rd try.  Which is what got me thinking about how that affects the average solution times and how problems / players are rated.
 
The bit about 11 milion players vs. 175k attemps for a single problem is probably misleading.  There are 386 million total attempts for the 11 million members, so that's about 35 puzzle attempts per user.
 
But if you look at the graphs tool-tips you'll see the number of players in each 100 point rating zone and those ad up to 93k.  So that's 386 million / 93k or 4150 attempts per tactics user.  There are only 56,399 puzzles.
 
Probably people only work on only a subset of those 56,399 puzzles.  For example puzzles right around their tactics rating +-50.  For example the 3528 players with a tactics rating between 350 and 450 might have about 23 k puzzles between 400 and 500.  I say might because I can only get up to page 100 in the puzzle listings.  I saw one of those puzzles had 2459 attempts.  This is more anecdotal evidence that players are seeing the same puzzle more than once.  That skews the results.
 
I don't know how many problems there are in the 1150 to 1250 range.  Probably the same number as in the 400 - 500 range.  But there you have 31243 players.  218767 (problem# 793 w/ 1173 rating) attempts spread over 31k players leads me to think there would be 2nd and 3rd attempts on this particular puzzle.  Especially as people stall in their progress.

The graph shows the ratings of the puzzles, not the players. I know the "tool tips" labels describe sets of players but that must be a mistake. The whole title of the graph is "Problems- Tactics Trainer" in a page about the puzzles.

It makes a lot of sense to conclude it is showing the ratings of the problems. It makes no sense to conclude that only 56,399 players have tried Tactics Trainer or that each tactics problem has had only 4,000 attempts.

chris_miner
notmtwain wrote:

The graph shows the ratings of the puzzles, not the players. I know the "tool tips" labels describe sets of players but that must be a mistake. The whole title of the graph is "Problems- Tactics Trainer" in a page about the puzzles.

It makes a lot of sense to conclude it is showing the ratings of the problems. It makes no sense to conclude that only 56,399 players have tried Tactics Trainer or that each tactics problem has had only 4,000 attempts.

I didn't claim there were only 56,399 players, that's (was) the total number of puzzles.  I claimed there were only about 93k tactics players (93,239) based on the info displayed in the tool tips.

If the tool-tip is mislabled then I'd think the totals from the tool tips would add up to about 56k rather than 93k.  On the other hand, there is indeed only one puzzle in the 3150 - 3250 range.  So maybe you're right.

I've just stubled accross the Players - Tacticst Trainer page where they list the total number of players at 252,782.  Also not 11 million.

But really... all this is neither here nor there.  The point of my post was that people see these puzzles more than once.  And including their 2nd attempts in the stats ruins the results.

Does anyone have a point of discussion related to that topic?

whirlwind2011

@OP: Attempting a problem more than once does not ruin the results. Second and ensuing attempts are what constitute practice of the concept(s), which is integral to mastery thereof. I'm confident that this is an intended feature of Tactics Trainer.

The keyword in your original post that drives your point is your first word: "If." It is a big "if."

chris_miner

So today I saw puzzle with the exact same solution I've seen before.  I'm not sure if it was the exact same puzzle since I don't have the history of all puzzles I've done.  In any case the solution was the same.  The average time for the puzzle is 4:18, but since I'd seen it before I did it in 0:43.  Adding my result to the stats screws up the stats, because I've already seen the puzzle, or one almost exactly like it.  It's just wrong to adjust the problems rating based on my performance.

Yesterday I got two problems in a row that had the exact same solution.  You can bet that the second puzzle was solved in half the time as the first.  Those puzzles were certainly two different puzzles since they were presented in the same session, but the solutions were exactly the same.  It doesn't do anyone any good to include my "second attempt" in the stats.

While seeing a problem twice may make me better at tactics, my second attempts shouldn't be included in adjusting the problems rating.

@whirlwind2011 two points: 1. Repeating puzzles I've already seen will not help mastery of tactics half as much as being presented with new puzzles that may have the same solution.  Changing the context of the solution is important to promote learning.  2. The "if" obviously isn't big if I've already anecdotally noticed it in puzzle comments and witnessed it myself after only doing 522 puzzles.

JamieKowalski
notmtwain wrote:

If you look at that chart you can also see the number of attempts made on each problem. All but 11 problems have less than 24,000 attempts.

The top 11 each have 175.000+ attempts. To me, it seems likely that this means that most people try these first 11 problems and then give up.

I have another theory.

When I use tactics trainer on my phone on the metro train, my connection drops out often as we go underground. When this happens, the app starts only showing the same small set of problems over and over (probably those 11). I think this behavior recently changed with the new iphone update, but it did that for years. 

I got really good at those 11 problems!

adumbrate

I also get the same a lot of times, but we can't expect tactics trainer to have a billion problems. More come every day, thats just how it goes.

chris_miner
skotheim2 wrote:

I also get the same a lot of times, but we can't expect tactics trainer to have a billion problems. More come every day, thats just how it goes.

And still, the way it works now is both broken, and fixable, so why not fix it?

ResetButton

I don't think it's broken. The more you understand a certain pattern the easier and faster you'll see it in-game. The more you practice it, the better imo. Even basic tactics are good for studying. All complicated positions consist of a plethora of small basic tactics woven together. Ratings don't matter either. I've got puzzles wrong that i've seen before. I can't be the only one.

Bobbarooski
ResetButton wrote:

I don't think it's broken. The more you understand a certain pattern the easier and faster you'll see it in-game. The more you practice it, the better imo. Even basic tactics are good for studying. All complicated positions consist of a plethora of small basic tactics woven together. Ratings don't matter either. I've got puzzles wrong that i've seen before. I can't be the only one.

I agree with this point. I used to get hung up on the statistics, but the ultimate aim of Tactics Trainer (at least for me) is to develop my pattern recognition skills so I can see the tactics in my games.  Tactics rating points don't mean anything to me anymore. 

chris_miner

the rating points are used to determine what puzzles you see.  if the ratings are skewed then you'll see the wrong puzzles for your tactics level.  Google "zone of proximal development" if you don't know why this matters.

whirlwind2011

@OP: The selection of TT problems is randomized and not scientific.

A common and accepted method of reinforcement of concepts is to repeat a problem that hasn't been seen or done in a while, to see if the participant can solve it the second time. Many times, he cannot, indicating that the problem was not redundant. One methodology involves continually repeating the problem at certain intervals (i.e., every few days or weeks) until the participant successfully solves the problem, ideally cementing the participant's mastery of the concept.

Two problems cannot have the same solution unless they are the same problem, having the same position. If the position is different, or if the problem is somehow different in any way, then the two solutions cannot be truly the same, even if the solution to both problems is, for example, Qb3-f7+.

chris_miner

The selection isn't just randomized it is also pooled around your current rating (ie TT problems within your ZPD are presented).  From the online help:

The next tactic for you is chosen randomly from within a pool of tactics that are within a rating band close to your current rating.

I'm intimately familiar with the concepts of spaced repetition.  I realize that it is helpful to me if I'm presented with the same problem after some time to reinforce learning.  I also realize that it is even more helpful to me if that problem is presented in a different context.  And I further realize that it is helpful to me to present problems in my ZPD.

It is however not helpful to anyone else if my second and third tries on a puzzle or similar puzzle result in skewing that puzzles legitemate rating.  Assuming I've learned anything, this would likely result in the puzzles rating going down.

It is also not helpful to me if anyone else's second and third tries on a puzzle or similar puzzle result in skewing that puzzles legitemate rating.  This would naturally lead to me being presented puzzles that are actually much harder than their ratings would imply.

That means the current system is broken.  I can't imagine why anyone would argue against fixing it.

whirlwind2011

@OP: Ratings are mere guidelines. They do not need any exact value, so they are not skewed for the detriment of future solvers. Thus one cannot prove that any rating is not legitimate, other than one person or group of people merely not liking the rating method.

Ratings are known to have a margin for error (given as the RD on Chess.com), so the notion of a legitimate rating is hypothetical.

Could you give a concrete example? You saw TT problem #00XXXXX three weeks ago and failed it. Then you saw it yesterday and you A) solved it correctly; or you B) failed it again. For both scenarios, explain the ramifications of the effect your second attempt had on the rating for any future solvers of the problem.

chris_miner

As is the case with a grandmaster's ELO rating the ratings are not mere guidelines.  They are adjusted based on whether or not they are solved, how quickly they are solved, and the ratings of the players and the puzzle at the time of the attempt.

The second time I see a puzzle, or a very similar puzzle (today I saw two pin the queen to the king puzzles and two smothered mate in the corner with the knight after sucking a castle in next to the king while losing my queen puzzles.

The smothered mates weren't the exact same puzzles, but they had the exact same solutions, and I saw them within 6 minutes of each other.  The 1st smothered mate problem (#130749) took me 18 seconds to recognize and solve.  The second smothered mate problem (#120903) took me only 8 seconds to solve because I was already primed to recognize and solve that kind of puzzle.  The average for the 2nd puzzle is 30 seconds.  My solution of that 2nd puzzle has now artificially pulled it's rating down by 9 points. And for that matter artificially raised my tactics rating by 9 points.

Ramifications:

future solvers of #120903 will be presented with a puzzle thought to have a rating of 1100 but which is likely a tick harder than that.  As such the puzzle may be outside their ZPD and won't be as helpful to their development as other puzzles would be.

For me, my inflated tactics rating will mean being presesnted with more difficult puzzles than would be helpful in developing my tactics skills.

In a nutshell, I'm now over-rated, and the puzzles are under-rated. In the long term we should see ever inflated tactics ratings combining with ever deflated puzzle ratings resulting in nobody being presented with puzzles that at or near their actual skill level.  That's bad for everyone.

whirlwind2011

@OP: The two puzzles you mentioned share the exact same theme of solution, but not the same solution. The final move may be written in identical fashion (Nf2# or Nh3-f2#), but that does not mean they have the same solution.

Both puzzles utilize exactly same idea of a Queen sacrifice (Qg1+), enemy Rook entombing its King (Rxg1), and smothered mate (Nf2#). However, considering them the same actual solution is fallacious, because the starting positions were unique.

Because the positions are different, they are considered two distinct puzzles. Your solving the first one helped you solve the second, just as your solving both puzzles will now help you to increase your chances of finding the tactical theme in an actual game, thereby winning.

If you were to now play in an official tournament, your enhanced knowledge of this theme could give you an edge over your opponents, which would affect your tournament rating. Consequently, you would be paired later versus slightly harder opponents, while your opponents would then be paired with slightly easier opponents (by virtue of their rating decreasing). If those opponents discovered that you used newfound TT knowledge of the smothered mate against them effectively, could they then complain that your study methods tainted your victory? (After all, had you not studied TT and sharpened your understanding of smothered mate, you might not have won some of those games.) Of course not!

So it is with TT. If future solvers of TT #0120903 find its rating too low, then more people will solve it, increasing its rating again. This is the intended function of ratings. They are guidelines, intended to fluctuate according to the strengths and needs of players. The notion that any rating (even a grandmaster's) is a scientifically determined, exact value is simply incorrect. After all, that's why it is adjusted frequently.