
A Bayesian ELO Inference Metric for CAMS II
CAMS II — Bayesian ELO Metric
Good morning detectives, today I am here to make an update regarding a new metric that we are going to incorporate into CAMS II.
CAMS II comes with a Bayesian metric to infer player rating strength (ELO) from move quality. The approach combines a discrete reference model of centipawn-loss (CPL) distributions by ELO bin with a likelihood-based posterior over bins, complemented by a one-sided floor test that answers: “At least which ELO can we statistically sustain?” We validate on a large scale Lichess corpus of approximately 2,500,000 games, aggregated into fixed size batches within the 700 −−2799 range, and report aggregate performance of MAE=77.3 ELO, RMSE=104.6 ELO, and Exact-bin accuracy=0.4.
(You can click on the image or here to go to the full 4 pages document)
For those of you who are not interested in mathematics, I leave the results here.
Overall Performance
MAE ≈ 77 ELO → On average, the system is about 77 points off from the actual rating.
--> This means that if a player is 1800 ELO, CAMS II typically estimates them to be between ~1720 and 1870.
RMSE ≈ 105 ELO → It penalizes large errors more, but is still low, demonstrating overall consistency.
Exact-bin accuracy ≈ 0.38 → It hits the exact bin (out of 100 points) almost 4 out of 10 times.
These values come from ~2.5 million Lichess games, grouped into fixed-size batches.
An average error of ±77 ELO is more than acceptable for a purely positional estimator, it doesn't consider wins, tempo, or openings, only the quality of moves.
The consistency by bin shows that the method doesn't "collapse" at any range: the curve is smooth and stable.
CAMS II doesn't have a release date yet, it's still in beta, but I hope it represents an exponential improvement over some of the initial limitations of the first version.
See you next time!
JA