Chessmaster 10th & 11 Vanessa Personality

Sort:

EscherehcsE

Dec 3, 2017

I'm creating this thread to continue the discussion on the Chessmaster Vanessa personality. The discussion ended up hijacking the original thread named "Lucas Chess engine settings are very weak"

https://www.chess.com/forum/view/chess-equipment/lucas-chess-engine-settings-are-very-weak

EscherehcsE

Dec 3, 2017

Well, I think my Chessmaster Vanessa engine tournament is far enough along that I can post some results. It's up to 1,800 games (120 games between each pair of the six engines). I don't know if I'm going to run the tournament beyond this point, as I might be reaching a point of diminishing returns. The tournament was made in Arena 3.0 with a time control of 40 moves in 4 minutes, repeating.

The results are shown below. They were calculated using the Bayeselo program. Note that the + and - columns are the possible margins of error (in elo points) based on a 95% confidence interval.

Of course, the four reference engines (non-CM engines) never perform EXACTLY to their official ratings; There is always some relative disparity between the actual results and the official ratings of the reference engines. The problem I have is to choose which reference engine to use as the offset value. (The offset value is the assumed correct engine strength that is used to calculate the other engine strengths.) Since I didn't know which engine rating was most accurate, my solution was to compromise; I calculated the Bayeselo ratings four different times using four different offset values, then I just took the average of the four results. Here we go:

Assumed Wing offset:

Assumed Matheus offset:

Assumed Gerbil offset:

Assumed Bestia offset:

Average of all offsets:

Discussion of Results:

Looking at the average of all offsets, we see that Wing overperformed its rating by 16 points, Matheus underperformed its rating by 4 points, Gerbil underperformed its rating by 2 points, and Bestia underperformed its rating by 8 points.

The Chessmaster 10 Vanessa personality averaged at 1964 elo, while the Chessmaster 11 x64 Vanessa personality averaged at 1944 elo. The fact that the CM11 Vanessa personality slightly underperformed the CM10 Vanessa personality was unexpected by me. (Both personalities used the same internal settings, the same CMX.abk opening book, and roughly the same hashtable size of 17 MB. The CM10 engine was The King ver 3.33 (32-bit), while the CM11 engine was The King 3.50 (64-bit).)

Bear in mind that with only 1800 games played, all of these elo ratings have possible margins of error of about plus or minus 21 elo points.

It's interesting to note that my Chessmaster 10th Edition program claims that Vanessa's strength is 2332 elo (base rating of 2111). (This claimed rating will vary slightly, depending on the PC's hardware.) Based on my testing, this claim of Chessmaster 10 Vanessa's rating is overly optimistic by 368 elo points. Many of the Chessmaster rating estimates are overly optimistic, so this isn't surprising.