You may find some answers at rybkaforum.net. It's very possible that some engine designers have already worked on this very same line of reasoning.
Mining critical positions from "N Best" engine evaluation scores?

Well, let me try and break this down.
My understanding is that trivial recaptures are those "I'll take yours after you take mine" moves that can be considered fairly obvious if the position is otherwise quiescent (no other tactical complications beyond the "Piece A takes Piece B" activity). In this situation, Failing to recapture would be considered losing by an engine's evaluation score, therefore the 2nd/3rd best moves would have scores that swing (by a large value) from the "recapture" move denoted as best. The 4.Bxc6 dxc6 in the exchange Ruy would be an simple example.
Now if you choose to sacrifice material, there are two possibilities:
i) The engine deems your sacrifice as best. (rare in non-tactical positions, as far as my understanding goes about engines and positional sacs)
ii) The engine thinks your move is far from what it deems best. (more likely)
In (i), if the engine saw (hard to imagine, in the examples you cited!) that the sac was best and all other moves were unable to "hold" the static evaluation of the position (stays +-, +/- etc. after the best move is played) if played, I'd consider that position to be flagged as "critical".
In (ii), I would assume that the other player's "to move" position would now be collected as critical, as the engine has found a move (with a P.V.) that busts the sacrifice with a score that far exceeds the 2nd/3rd eval scores. This may be completely erroneous on the part of the engine (who doesn't really "get" the point of the sac) but nevertheless, still logs it (for the wrong reasons!)
Though in both cases, I'd say our "Crit. position parser" would still collect them ... and as I mentioned before (limitation b.), they may or may not be "noise" for the purposes of our study ... not sure how the tool would be able to make that determination on its own.
To summarize, the answer to your question would be : The tool would collect all these cases regardless of they were trivial recaptures or positional sacrifices that the engine just couldn't understand.
While this sucks, the intention of the tool is to quickly "parse" out positions of potential interest in an automated manner ... leaving the "is this actually interesting" question to us.
Re-visiting an old chess app. tool I was writing a while ago ... had a question for those who know a thing or two about chess engine evaluation scores.
First of all, let's call a critical position in chess ONE where it is exceedingly important (to maintain the state i.e. winning/drawing) to find the best move. In other words, 2nd/3rd best moves don't cut it and are likely to change the state of the game (+- becomes = or even -+ !)
These critical positions could be tactical (as you might have already figured out), key endgame positions or even those where a specific positional/strategic move MUST be played (which we're hoping a decent engine can see clearly, given enough time!)
Assume we've given a reputable engine sufficient time to get a reasonably accurate "best 3 moves" analysis with eval. scores.
My question: If I were to parse through such a "engine-analyzed" PGN file, I wondered if I could "extract" all critical positions based on delta-differences of the evaluation scores between the "top" move and the 2nd + 3rd moves accurately?
For example, the PGN snippet could read
24. Nd4
{
24. Nd4 Bxd4 .. (+- 2.40 ) <-------------- Evidently a critical position?
24. Bd4 Bxd4 ...(+= 0.50)
24. e5 0-0 ... (= 0.02)
}
These differences in Eval. scores would need to be significantly large enough to change the "who is winning and by how much (+-, +/-, = etc.)" factor if the "best engine" move was not played. Thereby trivial winning positions (where both +3.4 and +6.0 eval-scored moves both win!) can be ignored.
I see the following limitations that would hurt accuracy and/or efficiency.
a) Engines tend to be materialistic and often clueless about positional chess so eval scores may not hold up in non-tactical/tablebase positions.
b) Trivial recaptures will get included and may "noise" up the list of actual critical positions that may be instructional.
c) Following up on a), I acknowledge that ideas/plans in chess don't often come in "single" moves and certain ideas that a game presents cannot possibly be simplified in terms of "one" key move.
Thanks for reading through this ... wanted to see if there was value in an "extraction" tool like this, assuming there wasn't one already?
I know Fritz's blundercheck extracts "blunders" (if they are made) and may be using some similar algorithm in its touted "auto analysis" tools to give people the ??/?/!/!! warm fuzzies.
Though I wondered if it would be neat to have an automated tool that could go over a Master game and point out all the "key" positions that were worth looking at? Or even statistically analyze how big of a minefield certain "explosive" openings like the Scotch Gambit/Max Lange attack are :)