An ELO number may need more parameters to modify the evaluation, I find from games that it would depend on the position and how much strategy versus tactics needed from that position. A person would need to play enough games for the algorithm to know the player style. Then for some games a player may be focused and others rushed because kids banging on the bathroom door.
An interesting format I found is when the game adjusts starting material based on ELO mismatch to give both players a fair chance at winning the game which kind of predicts how a game will play based on player ELO.
Stockfish (as well as many other engines) can provide a win-draw-loss (or expected game score) evaluation of a position. This of course is based on the engine playing against itself - human's player's success rate would be different depending on their skill level.
It would be very interesting to obtain statistical data that show win-draw-loss probabilities (or expected game score) for a human player with a given ELO rating as a function of position evaluation obtained from Stockfish or another strong engine. For low rating levels the confidence of such estimate would be very low and improve substantially for higher ratings.
The idea seems to lie on the surface and is not difficult to implement (although likely requires appreciable computing resources to run analysis on a large database of games). I have not been able to find such analysis in public domain, perhaps not formulating the my search queries correctly. Does anyone have references or can suggest why this is not a valid problem to solve?
I strongly suspect that the expected points model that chess.com describes in general terms as a basis of their game review is based on something similar. Even if so the data behind this model is not publicly available (or I don't know where to look).