Just more engine book theory testing

Sort:
Dyslexic_Goat

This is in part a continuation of my thread, "The playing style of Stockfish NNUE". I decided to split this into a different page when so far, NNUE's performance has suggested that while it is definitely able to squeeze out extra wins against other Stockfish versions, it may actually play worse against the more-entrenched neural nets (see LC0 vs StockfishNNUE statistics as reference). If true, NNUE's entry into the top competitive field may not be the blowout its fans may be hoping for, though it will almost certainly be an aid to the Stockfish development team for improvements to Stockfish 11&12, etc.

With that in mind, I'm currently testing opening book theory, using a custom openings book and a selection of the top engines of today -- 

 

Allie 0.6, net t60-4175

Stockfish dev (permanently somewhere between 11 & 12)

Stockfish NNUE

LC0, net t60-4175

 

We have seen neural nets, especially LC0, find some true novelties in competitive play. I'm wondering if my new rig, a Ryzen 9 3900X combined with an RTX 2060, can end up finding some new ideas if given a lot of patience and testing. Stay tuned.

 

Now! On to the setup:

Each engine has two copies competing in a round robin-style tournament. Time control is 1 minute per move -- long enough to avoid time-pressure blunders while short enough to finish a few matches per day. One engine copy is selecting moves from a custom openings book designed and maintained by myself [for better or worse] which has lines from the MCO/ECO books, PGN files from several top players (Carlsen, Kasparov, Fisher and others), and engine matches, especially from selections of TCEC. It also has some lines from my own playing, which tend to lead to plenty of mistakes; in those events I cancel a game and delete a faulty line as needed. Lines are updated and finetuned as mistakes are found. The most interesting matches tend to be when one engine copy fights with no openings book against an opponent with one, often forcing one engine to justify a slightly worse position.

 

Current results:

Up until recently every single game had been a draw. Now however, Allie [version no book] has strangled two points out of the other competitors, this time fighting LC0 from a relatively minor inaccuracy in the opening:

Standings as of now:

13649d8470df7316066537d2f9bca8a5.png

 

Dyslexic_Goat

The Polish opening doesn't automatically lose, how shocking. White did have to accept a fork that lost the a1 rook, but found adequate compensation leading into the endgame.

 

Dyslexic_Goat

Very interesting match today. Allie playing White, no openings book vs Stockfish NNUE with no openings book. The game is almost exactly what you'd expect from the two players' respective playing styles: 

Allie sacs a pawn in the opening to gain a decent amount of space on the Queenside, and enters what looks to be an equal position until move 19, Nd6+, turns out to be a mistake. NNUE then finds a way to blast open the Kingside and the game ends in less than 40 moves. It's mostly over in less than 30 actually.

 

Dyslexic_Goat
Dyslexic_Goat wrote:

Very interesting match today. Allie playing White, no openings book vs Stockfish NNUE with no openings book. The game is almost exactly what you'd expect from the two players' respective playing styles: 

Allie sacs a pawn in the opening to gain a decent amount of space on the Queenside, and enters what looks to be an equal position until move 19, Nd6+, turns out to be a mistake. NNUE then finds a way to blast open the Kingside and the game ends in less than 40 moves. It's mostly over in less than 30 actually.

 

I'll say it again: when Stockfish NNUE finds a way to make a kingside attack, the results are just surreal. 

Dyslexic_Goat

Stockfish NNUE, net 21v1, no book vs LC0, no book. 

 

This match ends the small testing tournament I had begun. The results  are below, and I'll be first to admit that for the engine copies that used opening books it can be a bit of a lottery which lines are chosen. Thus, the lines here statistically irrelevant to proving any engine here is stronger or weaker for now, and this testing is more to look for weaknesses/new potential improvements to theory.

 

Results for the finished round robin, which will now restart (updating Allie from 0.6 to 0.8 released today, and updating LC0 from 0.25 to 0.26, keeping the net file the same for now. Stockfish NNUE 2141 is still strongest in testing vs other NNUE nets but this is bound to change eventually. Other minor changes, switching to two minutes per move and updating various opening lines as needed):

 

2801c6f886ca6a2a6172ad31e1a106bc.png

 

PerpetuallyPinned

Engines are leaving current theory rather early.

I glanced over game in post #3 and it seemed maybe around move 13 White was having issues. By move 16, this can't be top notch stuff.

What are the time controls here?

Dyslexic_Goat

1 minute per move, [Hardware is AMD Ryzen 9 3900X at 24 cores, and RTX 2060] which has been changed to 2 minutes for next round of testing-- while 60 seconds was enough for Stockfish to brute force away from most mistakes, it's starting to look like it's not enough time for Allie and LC0 to avoid running into problems. Allie's performance in the above table especially. Updating to its current version should partially solve that problem.

They're running at exponentially lower node per second counts, though they do use those nodes a lot more efficiently. 

Dyslexic_Goat
This line was revisited again. This time, LC0 does follow the less risky choice between dxc3/d3. That minor change leads to a vastly different (and more playable) position for Black.

 

Dyslexic_Goat

AllieStein 15, no openings book vs Stockfish 11 dev version, with custom opening book. Games like this also demonstrate that SF 11 has fallen behind very, very much in positional skill relative to its top competition.

 

Dyslexic_Goat

I'm thrilled to say that I've found a specific line that our current engine overlords are very weak to. This line keeps showing up again and again when the bots choose to play the French without fully understanding it:

 

A sample match shows just how poorly this tends to play out:

If anyone's looking to grab a few wins against 3000+ rated opponents, this may be a good variation to do it!

PerpetuallyPinned

17...Qd8

Is that an engine book move?

Dyslexic_Goat

For that game neither side used an openings book past move 12. 

 

PerpetuallyPinned

That's where I think engines are bad for human chess.

Shear calculation vs understanding

You can probably find plans/reasons to support inferior engine moves, but why do that when other moves are already known?

Maybe bullet/blitz players don't care so much...idk

Dyslexic_Goat
Just to make sure the variation's weakness wasn't specific to Stockfish, the two rematched:

 

drmrboss

You should retest your NNUE, because there was significant elo changes in nnue since 24 days ago, T60-4175 is not the best leela version as well, best Leela is J92-70 or J92-100. (currently J92-70 is used in Alter Sufi/Nav stream)

 

See, different Lc0 net performances!


Match: @jhorthos J92-70 vs stockfish 11 in fixed nodes
LC0 version: 0.25.1
LC0 options: cudnn-fp16, 1 thread, default for the rest
SF options: 1 thread, 1GB Hash.
Time control: Leela: 1 kn/move, SF 11: 2 Mn/move
Hardware: RTX 2070, i7-7700
Book: New Chad 6ply book (500 openings)
Tablebases: 6-man TB for both.
Adjudication: 6-man TB, -draw movenumber=50 movecount=5 score=8 -resign movecount=5 score=1000
Software:cutechess-cli
Comments: J92-70: +36 elo, takes the 1st place with safe distance to the 2nd.

Leela: 1 thread, 1Kn/move
SF 11: 1 thread, 2Mn/move

Chad 6-ply book
# PLAYER : RATING ERROR POINTS PLAYED (%) CFS(%) W D L
+ 1 lc0.net.J92-70.1k : 36 15 549.5 1000 55.0 80 288 523 189
2 lc0.net.SV_4585.1k : 27 14 537.5 1000 53.8 56 263 549 188
3 lc0.net.SV_4175.1k : 26 15 535.5 1000 53.5 50 277 517 206
4 lc0.net.SV_4619.1k : 26 15 535.5 1000 53.5 66 275 521 204
...
8 lc0.net.J90-78.1k : 17 14 523.0 1000 52.3 56 243 560 197
9 lc0.net.J91-20.1k : 15 14 521.0 1000 52.1 53 239 564 197
10 lc0.net.J91-80.1k : 14 15 520.0 1000 52.0 56 241 558 201
11 lc0.net.J91-40.1k : 13 14 518.0 1000 51.8 62 233 570 197
12 lc0.net.J90-60.1k : 10 15 513.5 1000 51.4 50 225 577 198
13 lc0.net.J91-150.1k : 10 14 513.5 1000 51.4 56 238 551 211
...
- 17 Stockfish_11.2m : 0 ---- 8140.0 17000 47.9 78 3501 9278 4221
18 lc0.net.SV_3010.1k : -6 16 491.5 1000 49.1 --- 224 535 241

Dyslexic_Goat
drmrboss wrote:

You should retest your NNUE, because there was significant elo changes in nnue since 24 days ago, T60-4175 is not the best leela version as well, best Leela is J92-70 or J92-100. (currently J92-70 is used in Alter Sufi/Nav stream)

 

See, different Lc0 net performances!


Match: @jhorthos J92-70 vs stockfish 11 in fixed nodes
LC0 version: 0.25.1
LC0 options: cudnn-fp16, 1 thread, default for the rest
SF options: 1 thread, 1GB Hash.
Time control: Leela: 1 kn/move, SF 11: 2 Mn/move
Hardware: RTX 2070, i7-7700
Book: New Chad 6ply book (500 openings)
Tablebases: 6-man TB for both.
Adjudication: 6-man TB, -draw movenumber=50 movecount=5 score=8 -resign movecount=5 score=1000
Software:cutechess-cli
Comments: J92-70: +36 elo, takes the 1st place with safe distance to the 2nd.

Leela: 1 thread, 1Kn/move
SF 11: 1 thread, 2Mn/move

Chad 6-ply book
# PLAYER : RATING ERROR POINTS PLAYED (%) CFS(%) W D L
+ 1 lc0.net.J92-70.1k : 36 15 549.5 1000 55.0 80 288 523 189
2 lc0.net.SV_4585.1k : 27 14 537.5 1000 53.8 56 263 549 188
3 lc0.net.SV_4175.1k : 26 15 535.5 1000 53.5 50 277 517 206
4 lc0.net.SV_4619.1k : 26 15 535.5 1000 53.5 66 275 521 204
...
8 lc0.net.J90-78.1k : 17 14 523.0 1000 52.3 56 243 560 197
9 lc0.net.J91-20.1k : 15 14 521.0 1000 52.1 53 239 564 197
10 lc0.net.J91-80.1k : 14 15 520.0 1000 52.0 56 241 558 201
11 lc0.net.J91-40.1k : 13 14 518.0 1000 51.8 62 233 570 197
12 lc0.net.J90-60.1k : 10 15 513.5 1000 51.4 50 225 577 198
13 lc0.net.J91-150.1k : 10 14 513.5 1000 51.4 56 238 551 211
...
- 17 Stockfish_11.2m : 0 ---- 8140.0 17000 47.9 78 3501 9278 4221
18 lc0.net.SV_3010.1k : -6 16 491.5 1000 49.1 --- 224 535 241

 

Thanks for the advice @ LC0. My NNUE net is up to date to my knowledge though, using the latest binary from Abrok and the highest-tested net from https://www.comp.nus.edu.sg/~sergio-v/nnue/

 

I'll go ahead and update LC0.

PerpetuallyPinned

I'm guessing then, the engines are forced to play the French Defence. Were they forced to play 7 moves?

PerpetuallyPinned

Any resource for the

New Chad 6ply book (500 openings) ?

Dyslexic_Goat
PerpetuallyPinned wrote:

I'm guessing then, the engines are forced to play the French Defence. Were they forced to play 7 moves?

Good question.

I've observed for years, that many of the "classically" trained engines (ones that don't use neural networks) have fancied the French as a main response to 1. e4 when they're not given any opening books. I was reminded of this yesterday when Stockfish continued to play the French in the CCC, where the only move given in each match so far was 1. e4. Stockfish's results while playing the French were less than amazing -- both the new, NNUE merged Stockfish and Stockfish "classic" [11 dev]. This exact variation has come up several times as a reason why engines keep misplaying the French.

 

In response to your question: here, yes. The two games fed the first moves to the engines and they fought it out from there. But Stockfish and its neighbors have often strayed into this line under their own planning. It's proven to be an annoying blind spot for them. happy.png

PerpetuallyPinned

I'm wondering about the Qb6 choice

BTW, found a wiki on the chad opening book

http://lczero.org/dev/wiki/testing-guide/

Link in the wiki downloads a zip file with a few books in it