Can we test the current Stockfish beta, please? - Chess Forums

ChessconnectDGTTest

Jul 23, 2024

0

#1

Hi all,

I was wondering if someone (@desperatekingwalk ? ) would be available and so kind to test the very latest SF 16.1 (released today July 23rd and the new net, also released today) against the official SF (non beta) last release to see if there's any difference in strenght.

I start having some difficulties in understanding how subsequent patches (that claim sometimes to bring +2 or even more ELO gains) have not had the effect of pushing SF to break the barrier of 4.000 ELO points under all test and time conditions.

If you sum up all of the advertised gains, we should be well over 4.000.

I may be missing the point here.

Thanks!

AG

ChessconnectDGTTest

Jul 23, 2024

0

#2

DesperateKingWalk ha scritto:

What conditions would you like me to test?

Head to head is not really the best way to see rating gains.

You really need a pool of engines of different strengths then you can see rating gains... Even if SF 16.1 vs SF Beta draw all their games.

I will setup a test....

Many thanks DKW for your kind availability! BTW: Not urgent at all, of course!

Thanks!

AG

Powderdigit

Jul 23, 2024

0

#3

Very tricky for a layman like me to understand but looks interesting and great to see the collaboration. 👍

ChessconnectDGTTest

Jul 24, 2024

0

#4

@DesperateKingWalk excellent, thank-you.

The partial results are already interesting. SF 16.1 had already turned out to be somehow inferior to its predecessors (I wouldn't call it a flop), but it looks like the very recent patches have re-established this version as the leading one among the many they released. Time will confirm this.

Dragon 3.3 is a bit of disappointment as it was expected to be on par with the "official" 16.1, at least.

Interesting the approach to have the engines use an openings book. In my test sessions (although not as scientific and well administered as yours!) I usually removed any openings books, to avoid "drive" the engines into one or another specific direction. I was only relying on the engines' pure calculation. Is your approach used to avoid a specific engine plays always the same moves?

ChessconnectDGTTest

Jul 24, 2024

0

#5

I also forgot to comment: "7 man TB will be in use" - wow, this is HUGE!

ChessconnectDGTTest

Jul 24, 2024

0

#6

Ha! Interesting. One really needs to wait and judge based on a significant amount of data. This is why tests conducted with a few hundred games are not so meaningful!

xuanxuan101

Jul 24, 2024

0

#7

hello

ChessconnectDGTTest

Jul 24, 2024

0

#8

DesperateKingWalk ha scritto:

Here are the final results of the test for Stockfish Beta you requested. The results speak for themselves....

Many thanks for your kind support and availability. Yes, I believe it is clear that 16.1 dev is not bringing any noticeable difference compared to the "standard" 16.1, despite the dev team keep on publishing patches over patches, claiming gains and gains.

Thanks.

ChessconnectDGTTest

Aug 7, 2024

0

#9

@DesperateKingWalk, I'm getting back to this thread, as I wanted to ask if you had considered using an anti-draw openings book for your turney.

https://www.sp-cc.de/anti-draw-openings.htm

The idea is to try and avoid those many many draws and somehow force the engines to a different outcome, as much as possible.

Thanks

ChessconnectDGTTest

Aug 7, 2024

0

#10

Absolutely fine, no problem but thanks anyway.