A Bayesian ELO Inference Metric for CAMS II

A Bayesian ELO Inference Metric for CAMS II

Avatar of Jordi_Agost
| 2

CAMS II — Bayesian ELO Metric 

Good morning detectives, today I am here to make an update regarding a new metric that we are going to incorporate into CAMS II.

CAMS II comes with a Bayesian metric to infer player rating strength (ELO) from move quality. The approach combines a discrete reference model of centipawn-loss (CPL) distributions by ELO bin with a likelihood-based posterior over bins, complemented by a one-sided floor test that answers: “At least which ELO can we statistically sustain?” We validate on a large scale Lichess corpus of approximately 2,500,000 games, aggregated into fixed size batches within the 700 −−2799 range, and report aggregate performance of MAE=77.3 ELO, RMSE=104.6 ELO, and Exact-bin accuracy=0.4.


                                  (You can click on the image or here to go to the full 4 pages document)


For those of you who are not interested in mathematics, I leave the results here.

Overall Performance

MAE ≈ 77 ELO → On average, the system is about 77 points off from the actual rating.
--> This means that if a player is 1800 ELO, CAMS II typically estimates them to be between ~1720 and 1870.

RMSE ≈ 105 ELO → It penalizes large errors more, but is still low, demonstrating overall consistency.

Exact-bin accuracy ≈ 0.38 → It hits the exact bin (out of 100 points) almost 4 out of 10 times.

These values ​​come from ~2.5 million Lichess games, grouped into fixed-size batches.


An average error of ±77 ELO is more than acceptable for a purely positional estimator, it doesn't consider wins, tempo, or openings, only the quality of moves.

The consistency by bin shows that the method doesn't "collapse" at any range: the curve is smooth and stable.

CAMS II doesn't have a release date yet, it's still in beta, but I hope it represents an exponential improvement over some of the initial limitations of the first version.

See you next time!

JA