Upgrade to Chess.com Premium!

Engine tournament III

I've been running this for a few weeks now, after acquiring a batch of new engines.

The tournament started out with 54 engines, around a dozen have tested elo ratings of >2900! There are many others above 2700 and there are also a few others in that ball park for which i haven't been able to acquire tested ratings. My original intention was to have each engine to play against each other 6 times, with a time control of 2 minute + 2 seconds per move. Websites dedicated to engine testing tend to play time controls of only 3 minutes per game, which i personally don't believe is a sufficient test of strength, especially in longer games (engine v engine games can often go well over 100 moves). This is to separate the wheat from the chaff. After this i will reset all their grades based on this competition, and enter the good engines into round 2 where they will play a longer time control to find out which engine runes best on my computer.

I have also discovered that the engine testing people only test engines running on a single core, which is not how i would use it. They seem to believe that an engine shouldn't have an advantage over another just because it can use better technology. I don't follow that line of thinking, i think it's right that Deep Rybka 4 should be better than Rybka 4.

Talking of Rybka, it's been recognised as the best engine around for a few years now, but common belief among those who know is that Houdini 2.0 is now the best around. We'll see..

Now that i'm well into the tournament, i can see that it's far bigger and more complicated than i thought it would be at the outset. I realise that i better start keeping some notes on it, before it gets out of hand (so it might as well be here!) After around 2,000 games played in the tournament, Brutus has failed to load after playing 75 of its allotted games. Restarting the program didn't cure the problem, neither did rebooting the computer. Brutus has now been disqualified and it's games removed. I have also quickened the games, time control has been reduced to just 5 minutes. The previous time control allowed a game of 150 moves (and there were quite a few!) to last 32 minutes per side.

The conditions for the tournament are:-

Engines can use however many cores they want or need, but i only have 2;

All engines have access to the latest version of the "Strong" Fritz Powerbook (more to add on openings books later), and Nalimov Endgame Tablebases.

Comments


  • 5 months ago

    mcris

    Maybe you should post PC specs: Frequency of the processor (and type), RAM (Hash size), chess interface (Fritz?).

    So Brutus has a self-destruct code? Groovy! Smile

  • 3 years ago

    gambit-man

    Finally finished, results below compiled by EloStat 1.3:-

        Program                            Score     %    Av.Op.  Elo    +   -    Draws

     

      1 Houdini 2.0 x64                : 107.5/168  64.0   2943   3043   36  35   53.0 %

      2 Critter 1.4a 64-bit SSE4       :  99.5/168  59.2   2948   3012   34  34   57.7 %

      3 Strelka 5.1                    :  85.0/168  50.6   2955   2959   35  35   56.0 %

      4 IvanHoe 9.47b x64              :  81.5/168  48.5   2957   2947   33  33   61.3 %

      5 Stockfish 2.1.1 JA 64bit       :  78.5/168  46.7   2959   2936   35  35   55.4 %

      6 Fire 2.2 xTreme x64            :  74.5/168  44.3   2961   2921   36  36   54.2 %

      7 Deep Rybka 4 x64               :  73.0/168  43.5   2961   2916   35  36   54.8 %

      8 FireBird 1.2 x64               :  72.5/168  43.2   2962   2914   36  36   53.0 %


    Individual statistics:

     

    1 Houdini 2.0 x64           : 3043  168 (+ 63,= 89,- 16), 64.0 %

     

    Fire 2.2 xTreme x64           :  24 (+  7,= 13,-  4), 56.2 %

    Deep Rybka 4 x64              :  24 (+ 13,=  9,-  2), 72.9 %

    Strelka 5.1                   :  24 (+  6,= 15,-  3), 56.2 %

    Stockfish 2.1.1 JA 64bit      :  24 (+ 12,= 11,-  1), 72.9 %

    IvanHoe 9.47b x64             :  24 (+  9,= 12,-  3), 62.5 %

    FireBird 1.2 x64              :  24 (+  8,= 15,-  1), 64.6 %

    Critter 1.4a 64-bit SSE4      :  24 (+  8,= 14,-  2), 62.5 %

     

    2 Critter 1.4a 64-bit SSE4  : 3012  168 (+ 51,= 97,- 20), 59.2 %

     

    Houdini 2.0 x64               :  24 (+  2,= 14,-  8), 37.5 %

    Fire 2.2 xTreme x64           :  24 (+ 10,= 11,-  3), 64.6 %

    Deep Rybka 4 x64              :  24 (+ 10,= 13,-  1), 68.8 %

    Strelka 5.1                   :  24 (+  6,= 18,-  0), 62.5 %

    Stockfish 2.1.1 JA 64bit      :  24 (+  5,= 17,-  2), 56.2 %

    IvanHoe 9.47b x64             :  24 (+  9,= 11,-  4), 60.4 %

    FireBird 1.2 x64              :  24 (+  9,= 13,-  2), 64.6 %

     

    3 Strelka 5.1               : 2959  168 (+ 38,= 94,- 36), 50.6 %

     

    Houdini 2.0 x64               :  24 (+  3,= 15,-  6), 43.8 %

    Fire 2.2 xTreme x64           :  24 (+  9,= 11,-  4), 60.4 %

    Deep Rybka 4 x64              :  24 (+  5,= 11,-  8), 43.8 %

    Stockfish 2.1.1 JA 64bit      :  24 (+  7,= 15,-  2), 60.4 %

    IvanHoe 9.47b x64             :  24 (+  6,= 14,-  4), 54.2 %

    FireBird 1.2 x64              :  24 (+  8,= 10,-  6), 54.2 %

    Critter 1.4a 64-bit SSE4      :  24 (+  0,= 18,-  6), 37.5 %

     

    4 IvanHoe 9.47b x64         : 2947  168 (+ 30,=103,- 35), 48.5 %

     

    Houdini 2.0 x64               :  24 (+  3,= 12,-  9), 37.5 %

    Fire 2.2 xTreme x64           :  24 (+  3,= 18,-  3), 50.0 %

    Deep Rybka 4 x64              :  24 (+  5,= 17,-  2), 56.2 %

    Strelka 5.1                   :  24 (+  4,= 14,-  6), 45.8 %

    Stockfish 2.1.1 JA 64bit      :  24 (+  5,= 17,-  2), 56.2 %

    FireBird 1.2 x64              :  24 (+  6,= 14,-  4), 54.2 %

    Critter 1.4a 64-bit SSE4      :  24 (+  4,= 11,-  9), 39.6 %

     

    5 Stockfish 2.1.1 JA 64bit  : 2936  168 (+ 32,= 93,- 43), 46.7 %

     

    Houdini 2.0 x64               :  24 (+  1,= 11,- 12), 27.1 %

    Fire 2.2 xTreme x64           :  24 (+  9,= 10,-  5), 58.3 %

    Deep Rybka 4 x64              :  24 (+  9,= 13,-  2), 64.6 %

    Strelka 5.1                   :  24 (+  2,= 15,-  7), 39.6 %

    IvanHoe 9.47b x64             :  24 (+  2,= 17,-  5), 43.8 %

    FireBird 1.2 x64              :  24 (+  7,= 10,-  7), 50.0 %

    Critter 1.4a 64-bit SSE4      :  24 (+  2,= 17,-  5), 43.8 %

     

    6 Fire 2.2 xTreme x64       : 2921  168 (+ 29,= 91,- 48), 44.3 %

     

    Houdini 2.0 x64               :  24 (+  4,= 13,-  7), 43.8 %

    Deep Rybka 4 x64              :  24 (+  2,= 15,-  7), 39.6 %

    Strelka 5.1                   :  24 (+  4,= 11,-  9), 39.6 %

    Stockfish 2.1.1 JA 64bit      :  24 (+  5,= 10,-  9), 41.7 %

    IvanHoe 9.47b x64             :  24 (+  3,= 18,-  3), 50.0 %

    FireBird 1.2 x64              :  24 (+  8,= 13,-  3), 60.4 %

    Critter 1.4a 64-bit SSE4      :  24 (+  3,= 11,- 10), 35.4 %

     

    7 Deep Rybka 4 x64          : 2916  168 (+ 27,= 92,- 49), 43.5 %

     

    Houdini 2.0 x64               :  24 (+  2,=  9,- 13), 27.1 %

    Fire 2.2 xTreme x64           :  24 (+  7,= 15,-  2), 60.4 %

    Strelka 5.1                   :  24 (+  8,= 11,-  5), 56.2 %

    Stockfish 2.1.1 JA 64bit      :  24 (+  2,= 13,-  9), 35.4 %

    IvanHoe 9.47b x64             :  24 (+  2,= 17,-  5), 43.8 %

    FireBird 1.2 x64              :  24 (+  5,= 14,-  5), 50.0 %

    Critter 1.4a 64-bit SSE4      :  24 (+  1,= 13,- 10), 31.2 %

     

    8 FireBird 1.2 x64          : 2914  168 (+ 28,= 89,- 51), 43.2 %

     

    Houdini 2.0 x64               :  24 (+  1,= 15,-  8), 35.4 %

    Fire 2.2 xTreme x64           :  24 (+  3,= 13,-  8), 39.6 %

    Deep Rybka 4 x64              :  24 (+  5,= 14,-  5), 50.0 %

    Strelka 5.1                   :  24 (+  6,= 10,-  8), 45.8 %

    Stockfish 2.1.1 JA 64bit      :  24 (+  7,= 10,-  7), 50.0 %

    IvanHoe 9.47b x64             :  24 (+  4,= 14,-  6), 45.8 %

    Critter 1.4a 64-bit SSE4      :  24 (+  2,= 13,-  9), 35.4 %

  • 3 years ago

    gambit-man

    In the end i plumped for just 8 engines, playing each other 24x. This stage is now half-way through, the current table standings:-

    The standout head-to-head is the royal whooping dished out to Deep Rybka 4 by Houdini 2. DR4 has under-performed so far, but there is still plenty of games to turn things around.

  • 3 years ago

    gambit-man

    Full results as compiled by EloStat 1.3:-

        Program                            Score     %    Av.Op.  Elo    +   -    Draws

      1 Houdini 2.0 x64                : 265.0/318  83.3   2754   3034   38  37   26.4 %

      2 Deep Rybka 4 x64               : 247.0/318  77.7   2756   2972   34  33   33.3 %

      3 Stockfish 2.1.1 JA 64bit       : 246.5/318  77.5   2756   2971   33  32   34.3 %

      4 FireBird 1.2 x64               : 246.0/318  77.4   2756   2969   32  32   35.8 %

      5 Critter 1.4a 64-bit SSE4       : 246.0/318  77.4   2756   2969   32  31   37.7 %

      6 IvanHoe 9.47b x64              : 242.5/318  76.3   2756   2958   32  31   36.2 %

      7 Strelka 5.1                    : 242.0/318  76.1   2756   2957   33  32   35.2 %

      8 Fire 2.2 xTreme x64            : 241.5/318  75.9   2756   2956   31  30   39.3 %

      9 Critter 1.2 64-bit             : 240.5/318  75.6   2756   2953   31  30   39.9 %

     10 Komodo64 3                     : 239.5/318  75.3   2756   2950   34  34   31.1 %

     11 Rybka 4 x64                    : 239.0/318  75.2   2756   2948   33  33   33.3 %

     12 Stockfish 1.9.1 JA 64bit       : 236.5/318  74.4   2756   2941   34  33   31.8 %

     13 RobboLito 0.085g3 x64          : 235.5/318  74.1   2756   2938   32  32   35.5 %

     14 Rybka 3                        : 235.5/318  74.1   2756   2938   34  33   31.8 %

     15 Fire 1.31 x64                  : 232.5/318  73.1   2756   2930   32  31   36.2 %

     16 Houdini 1.01 x64 2_CPU         : 231.5/318  72.8   2756   2927   33  32   33.6 %

     17 RobboLito 0.09 x64             : 230.5/318  72.5   2756   2925   33  33   33.0 %

     18 IvanHoe v73                    : 226.5/318  71.2   2757   2914   32  32   34.3 %

     19 Naum 4.2                       : 219.5/318  69.0   2757   2896   33  32   32.4 %

     20 Komodo64 1.3 JA                : 213.5/318  67.1   2757   2881   33  33   31.1 %

     21 Gull 1.2 x64                   : 200.0/318  62.9   2758   2849   32  32   31.4 %

     22 Protector 1.4.0 x64 JA         : 196.0/318  61.6   2758   2840   32  32   32.7 %

     23 Fritz 11 SE                    : 192.5/318  60.5   2758   2832   32  32   33.0 %

     24 cyclone xTreme                 : 183.0/318  57.5   2759   2811   32  32   30.8 %

     25 Glaurung 2.2 JA                : 178.5/318  56.1   2759   2802   32  32   30.5 %

     26 Belka 1.8.20                   : 177.5/318  55.8   2759   2799   32  32   31.8 %

     27 Toga II 1.4.1SE JA             : 177.5/318  55.8   2759   2799   32  32   29.9 %

     28 Toga II 3.1.2SE JA             : 167.5/318  52.7   2759   2778   31  31   34.3 %

     29 Gambit Fruit 1.0 Beta 4bx JA   : 160.0/318  50.3   2759   2762   33  33   27.0 %

     30 Daydreamer 1.75 JA             : 151.5/318  47.6   2760   2743   32  32   29.2 %

     31 Crafty 23.01                   : 148.0/318  46.5   2760   2736   32  32   32.1 %

     32 Cyrano 0.6b17 JA               : 137.0/318  43.1   2760   2712   32  33   28.9 %

     33 RedQueen 0.9.8 JA              : 135.0/318  42.5   2760   2708   33  34   24.5 %

     34 Viper 0.1 JA SMP               : 123.5/318  38.8   2761   2682   34  34   23.0 %

     35 Pepito v1.59.2 x64 JA          : 115.5/318  36.3   2761   2664   33  34   27.4 %

     36 Patriot 2006                   : 115.5/318  36.3   2761   2664   35  35   21.7 %

     37 Kaissa 1.8a                    : 114.0/318  35.8   2761   2660   35  35   20.8 %

     38 Crafty 19.19                   : 112.0/318  35.2   2761   2656   35  35   22.0 %

     39 GreKo 8.0 JA                   : 109.5/318  34.4   2762   2650   35  35   21.7 %

     40 Pawny 0.2.1 JA                 :  99.5/318  31.3   2762   2625   36  37   18.6 %

     41 Flux 2.2                       :  98.5/318  31.0   2762   2623   36  37   18.6 %

     42 Sungorus 1.4 JA                :  91.5/318  28.8   2762   2605   37  37   19.2 %

     43 Comet B68                      :  84.0/318  26.4   2763   2585   38  39   17.6 %

     44 Simplex 0.9.7 rev 180 JA       :  77.0/318  24.2   2763   2565   40  41   13.8 %

     45 Ifrit_j4_3_Beta_6_2_2011_JA    :  76.0/318  23.9   2763   2562   38  39   20.1 %

     46 Bison 8.2.4r                   :  73.5/318  23.1   2763   2555   39  40   17.9 %

     47 Mediocre v0.34                 :  72.5/318  22.8   2763   2552   40  41   14.2 %

     48 Kurt 0.9.2.2 JA                :  67.5/318  21.2   2764   2536   39  40   19.2 %

     49 ZCT JA x64 -0.3.2500           :  57.5/318  18.1   2764   2502   43  44   14.2 %

     50 Carballo Chess Engine v0.5     :  43.0/318  13.5   2765   2443   49  51   10.1 %

     51 demon 1.0                      :  34.0/318  10.7   2766   2398   52  54   10.1 %

     52 Rocinante 1.01 JA              :  26.0/318   8.2   2767   2347   59  61    8.2 %

     53 Philemon C (JA)                :  22.5/318   7.1   2768   2320   58  61    9.7 %

     54 Eden 0.0.13                    :  12.0/318   3.8   2770   2207   82  37    4.4 %

     

    Some issues arising out of stage 1:-

    ·         Some of the book lines went so deep (I saw some go 30 moves) that the result was virtually decided before the engines take over

    ·         Some lines have a distinct advantage for one side or the other. Those in the know tell me that the engines do not have the ability to seek out favourable lines, they merely select book lines at random.

    ·         There were many of the engines which could not access the Nalimov endgame tablebases, they use a different format of tablebase.

    Stage 2, I had hoped to whittle the competitors down to about 10 or 12, but looking at the finishing table, 18 seems the more natural place to split the table. This is because there is little difference between any engine and those either side of it. There is a clear difference between 18th and 19th place.

    I was minded to increase the time control to 2 minutes + 6 seconds per move (3 minutes each for a 30 move game, 6 minutes each for a 60 move game, etc) and to have each engine play against each other engine 12 times. My estimation is that would take around 6 weeks to complete, and I’m not sure I wanna do that, so I’m gonna have to consider this over the next couple of days.

    I had assumed that the ‘Strong’ version of Fritz Powerbook would have no weak lines, and this has proved to be incorrect. I had considered using a professionally produced test set, designed with such tournaments in mind, where all lines are roughly equal. Noomen’s Test Suite 2012 is almost ideal in this respect, except I don’t feel it offers sufficient variety of play. A balance has to be struck. I have developed my own openings book, which I intend to enter into some upcoming tournaments (run by other people). This is my first attempt at a competitive openings book, and I’l be up against people who have been at it for ages so expect to finish in division 3, but hey, there’s only one way to learn.

    Talking of learning, I will switch the ‘book learning’ to ‘on’, meaning the engines will avoid playing the lines that end up with losses. This will also fine-tune my book for entering those book tournaments.

    I have acquired a full set (5 piece) of Gaviota endgame tablebases, which I believe to be the format used by those engines which cannot access the Nalimov set. Hopefully this means that all the engines will now be able to access endgame tablebases.

    Finally for now, in tidying up this page I’m left with a whole bunch of “[COMMENT DELETED]” entries on this page, and it’s pissing me off. Anyone know how to get rid of them?

  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

    I've acquired another promising engine, Critter 1.4. Critter 1.2 is already running in the tournament, and is widely regarded as one of the better engines, with a tested rating of 2952. It is also seemingly better than even the strongest engines at solving complex mates. It would foolish not to include the new version, assuming it has improvements over the old version. I'm as yet undecided whether to add it in now, or wait until i weed out the weaker engines.

  • 3 years ago

    gambit-man

    [COMMENT DELETED]
  • 3 years ago

    gambit-man

Back to Top

Post your reply: