Estimating Elo Ratings from Evaluation Graph Area (Chess Analysis)

Sort:
AyushiKrishnan

Hey everyone 👋

I’ve been working on an idea to estimate the Elo strength of chess players based on the surface area under their evaluation graph, like the ones you see on Lichess or from Stockfish analysis. This isn’t a replacement for actual Elo systems, but a heuristic approximation based on engine evals over the course of a game.

 
🎯 The Problem
Can we use the centipawn advantage graph (evaluation vs. move number) to estimate:

How well each player performed overall
An approximate Elo difference
Possibly even their actual ratings (if we assume a base level)
 
📈 The Concept
The idea is simple:

Each half-move (ply) has a centipawn evaluation (positive = White better, negative = Black better).
By calculating the "area under the graph" for each side — that is, summing up the moments when each player is in a favorable position — we can estimate who dominated the game and by how much.
 
🧮 The Algorithm
python
CopyEdit
def calculate_elo_from_eval_graph(white_eval, black_eval, base_elo=1500):
    # Step 1: Sum of positive evals for each side
    white_area = sum(max(0, eval) for eval in white_eval)
    black_area = sum(max(0, -eval) for eval in black_eval)

    # Step 2: Normalize over move count
    total_moves = len(white_eval)
    white_avg = white_area / total_moves
    black_avg = black_area / total_moves

    # Step 3: Eval difference
    delta = white_avg - black_avg  # in centipawns

    # Step 4: Convert to Elo diff
    # 100 cp ≈ 200 Elo, so 1 cp ≈ 2 Elo (rough approximation)
    elo_diff = delta * 2

    # Step 5: Apply to base Elo
    white_elo = base_elo + elo_diff / 2
    black_elo = base_elo - elo_diff / 2

    return round(white_elo), round(black_elo)

 
✅ Example
python
CopyEdit
white_eval = [50, 60, 80, 100, 120, 150, 180, 200, 220, 250]
black_eval = [0, -10, -20, -30, -40, -60, -80, -90, -100, -110]

white_elo, black_elo = calculate_elo_from_eval_graph(white_eval, black_eval)
# Might output: (1650, 1350)

 
📌 Assumptions & Notes
This method assumes reasonably accurate engine evaluations (like from Stockfish 12+).
A consistent positional advantage throughout the game is treated as higher strength.
It doesn't account for tactical accuracy (missed mates, tactics) unless reflected in the evals.
You can get eval data from PGNs using tools like python-chess + Stockfish.
 
🔧 Possible Improvements
Smoothen evals with moving average or integration techniques (e.g. Simpson’s Rule).
Factor in blunder count or volatility.
Use machine learning regression if you have a dataset of real Elo and corresponding graphs.
 
Would love to hear what the community thinks!
Does this seem like a valid approach for approximating strength, especially for post-game analysis or anonymous players?

Let me know your thoughts or improvements. 🙌

AyushiKrishnan

Mods are you gonna do something about this post constantly up here