Perhaps the player ratings are a consideration?
same moves get rated differently depening on the game?

Perhaps the player ratings are a consideration?
I thought about that but it doesn't work out, it would make sense if at lower level the evaluation was less harsh but in the first example it's the opposite
also games 3 and 4 are both at 1300

Might be some "fine tuning" of the algorithm in the time between games. A move's classification is judgmental, and the criteria sometimes change with time.

well, if those games were from different times, then the game review's algorithm might have been updated.

I encountered this a while back. In a game review, one move was suggested as best, with a positive evaluation of the position. The suggested move seemed strange to me and so I played the game through on an Analysis board. Here, the suggested move was considered a mistake, leading to a negative evaluation. This was just a few minutes after the game review, so it is unlikely that a software update occurred.
So, the all-knowing stockfish is not infallible
I encountered this a while back. In a game review, one move was suggested as best, with a positive evaluation of the position. The suggested move seemed strange to me and so I played the game through on an Analysis board. Here, the suggested move was considered a mistake, leading to a negative evaluation. This was just a few minutes after the game review, so it is unlikely that a software update occurred.
So, the all-knowing stockfish is not infallible
It is not infallible honestly only (sometimes ) listen to it in openings but tbh stockfish is pretty dumb
And the fact that stockfish runs differently each time it "reveiws " a game it's like it's running a game on separate computers or something one of the mistakes people make is listening to stockfish every move
Iv been trying to not listen to stockfish recently when I review my games (I don't listen to it at all cause dumb but I use it sometimes) Im just trying to make it so I don't use it at all

first game: h3 is a great move, Nd5 is best, Nxe5 is good
second game: h3 is best, Nd5 is great, Nxe5 is an inaccuracy
third game: h3 is good, Nd5 is best, Nxe5 is best 🤯
I've noticed that in openings the exact same move may be rated differently in different games
here are 2 identical games but in the first game Qf3 is a blunder and in the second one it's just a mistake
another example:
2 games have Qe7 as great move, one has it as just best
2 have d3 as mistake, one as blunder