Can we test the current Stockfish beta, please?

Sort:
ChessconnectDGTTest

Hi all,

I was wondering if someone (@desperatekingwalk ? tongue) would be available and so kind to test the very latest SF 16.1 (released today July 23rd and the new net, also released today) against the official SF (non beta) last release to see if there's any difference in strenght.

I start having some difficulties in understanding how subsequent patches (that claim sometimes to bring +2 or even more ELO gains) have not had the effect of pushing SF to break the barrier of 4.000 ELO points under all test and time conditions.

If you sum up all of the advertised gains, we should be well over 4.000.

I may be missing the point here.

Thanks!

AG

ChessconnectDGTTest
DesperateKingWalk ha scritto:

What conditions would you like me to test?

Head to head is not really the best way to see rating gains.

You really need a pool of engines of different strengths then you can see rating gains... Even if SF 16.1 vs SF Beta draw all their games.

I will setup a test....

Many thanks DKW for your kind availability! BTW: Not urgent at all, of course!

Thanks!

AG

Powderdigit

Very tricky for a layman like me to understand but looks interesting and great to see the collaboration. 👍

ChessconnectDGTTest

@DesperateKingWalk excellent, thank-you.

The partial results are already interesting. SF 16.1 had already turned out to be somehow inferior to its predecessors (I wouldn't call it a flop), but it looks like the very recent patches have re-established this version as the leading one among the many they released. Time will confirm this.

Dragon 3.3 is a bit of disappointment as it was expected to be on par with the "official" 16.1, at least.

Interesting the approach to have the engines use an openings book. In my test sessions (although not as scientific and well administered as yours!) I usually removed any openings books, to avoid "drive" the engines into one or another specific direction. I was only relying on the engines' pure calculation. Is your approach used to avoid a specific engine plays always the same moves?

ChessconnectDGTTest

I also forgot to comment: "7 man TB will be in use" - wow, this is HUGE!

ChessconnectDGTTest

Ha! Interesting. One really needs to wait and judge based on a significant amount of data. This is why tests conducted with a few hundred games are not so meaningful!

xuanxuan101

hello

ChessconnectDGTTest
DesperateKingWalk ha scritto:

Here are the final results of the test for Stockfish Beta you requested. The results speak for themselves....

Many thanks for your kind support and availability. Yes, I believe it is clear that 16.1 dev is not bringing any noticeable difference compared to the "standard" 16.1, despite the dev team keep on publishing patches over patches, claiming gains and gains.

Thanks.

ChessconnectDGTTest

@DesperateKingWalk, I'm getting back to this thread, as I wanted to ask if you had considered using an anti-draw openings book for your turney.

https://www.sp-cc.de/anti-draw-openings.htm

The idea is to try and avoid those many many draws and somehow force the engines to a different outcome, as much as possible.

Thanks

ChessconnectDGTTest

Absolutely fine, no problem but thanks anyway.