15 Years of Chess Engine Development, What would be the score between super GM vs Engines now?

Sort:
drmrboss

(credit from reddit)

Fifteen years ago, in October of 2002, Vladimir Kramnik and Deep Fritz were locked in battle in the Brains in Bahrainmatch. If Kasparov vs. Deep Blue was the beginning of the end for humans in Chess, then the Brains in Bahrain match was the middle of the end. It marked the first match between a world champion and a chess engine running on consumer-grade hardware, although its eight-processor machine was fairly exotic at the time.

Ultimately, Kramnik and Fritz played to a 4-4 tie in the eight-game match. Of course, we know that today the world champion would be crushed in a similar match against a modern computer. But how much of that is superior algorithms, and how much is due to hardware advances? How far have chess engines progressed from a purely software perspective in the last fifteen years? I dusted off an old computer and some old chess engines and held a tournament between them to try to find out.

I started with an old laptop and the version of Fritz that played in Bahrain. Playing against Fritz were the strongest engines at each successive five-year anniversary of the Brains in Bahrain match: Rybka 2.3.2a (2007), Houdini 3 (2012), and Houdini 6 (2017). The tournament details, cross-table, and results are below.

Tournament Details

  • Format: Round Robin of 100-game matches (each engine played 100 games against each other engine).

  • Time Control: Five minutes per game with a five-second increment (5+5).

  • Hardware: Dell laptop from 2006, with a 32-bit Pentium M processor underclocked to 800 MHz to simulate 2002-era performance (roughly equivalent to a 1.4 GHz Pentium IV which would have been a common processor in 2002).

  • Openings: Each 100 game match was played using the Silver Opening Suite, a set of 50 opening positions that are designed to be varied, balanced, and based on common opening lines. Each engine played each position with both white and black.

  • Settings: Each engine played with default settings, no tablebases, no pondering, and 32 MB hash tables, except that Houdini 6 played with a 300ms move overhead. This is because in test games modern engines were losing on time frequently, possibly due to the slower hardware and interface.

Results

Engine 1 2 3 4 Total
Houdini 6 ** 83.5-16.5 95.5-4.5 99.5-0.5 278.5/300
Houdini 3 16.5-83.5 ** 91.5-8.5 95.5-4.5 203.5/300
Rybka 2.3.2a 4.5-95.5 8.5-91.5 ** 79.5-20.5 92.5/300
Fritz Bahrain 0.5-99.5 4.5-95.5 20.5-79.5 ** 25.5/300

I generated an Elo rating list using the results above. Anchoring Fritz's rating to Kramnik's 2809 at the time of the match, the result is:

Engine Rating
Houdini 6 3451
Houdini 3 3215
Rybka 2.3.2a 3013
Fritz Bahrain 2809

Conclusions

The progress of chess engines in the last 15 years has been remarkable. Playing on the same machine, Houdini 6 scored an absolutely ridiculous 99.5 to 0.5 against Fritz Bahrain, only conceding a single draw in a 100 game match. Perhaps equally impressive, it trounced Rybka 2.3.2a, an engine that I consider to have begun the modern era of chess engines, by a score of 95.5-4.5 (+91 =9 -0). This tournament indicates that there was clear and continuous progress in the strength of chess engines during the last 15 years, gaining on average nearly 45 Elo per year. Much of the focus of reporting on man vs. machine matches was on the calculating speed of the computer hardware, but it is clear from this experiment that one huge factor in computers overtaking humans in the past couple of decades was an increase in the strength of engines from a purely software perspective. If Fritz was roughly the same strength as Kramnik in Bahrain, it is clear that Houdini 6 on the same machine would have completely crushed Kramnik in the match.

 

 

Possible Conclusion, even with 15 years ago hardware, today SF or Houdini will crash 2800 GM by 99 vs 1 in a 100 games match.

About 10 times more powerful current hardware plus software,  Stockfish on common desktop will crush 2800 GM by 399 vs 1 in a 400 games match!

 

 

 

notmtwain

Not a fair match.  You gave Houdini 6 a time handicap. How did you decide that 300 ms was appropriate?

drmrboss
notmtwain wrote:

Not a fair match.  You gave Houdini 6 a time handicap. How did you decide that 300 ms was appropriate?

ms= move overhead. Yes, It is  a penalty ( reduce time from allocated time).

Practically chess engine clock and Graphic User Interface clock (Arena, Fritz etc) work independently. In arena or web engines, although you see the engines's allocated time is 1 second (1000 ms) remaining ( for example), the actual time given to the engine from the GUI may be ( 1020 ms). because of the delay between the communication. So Houdini may think that he still have (1020 ms ) and overthink and lost on time.

 

To prevent this, all modern engines  like stockfish have 200 ms as standard ( actually allowing engines to think 800ms only from 1000 ms allocated time) to prevent unnecessary time loss. Programmers use up to (1000 ms ) in case of laggy Graphical User Interface protocol and also they use up to 2000 ms for web engines , cos of additional network lagging.

 

To answer your question, 300ms  penalty  from 5 mins (300, 000 ms ) is negligible (0.1% time penalty) . As 100% time bonus/penalty is usually equivalent to 50 elo, 0.1% time penalty is approx 0.05 elo. 

Preggo_Basashi

Neat, and this is blog worthy not just forum worthy.

 

Anyway I'll be really impressed when technology reaches the level that it can calculate as little as humans, but still beat humans.

Do humans even calculate 1 position per second on average?

TracySMiller

Fantastic test results! Chess engines have indeed advanced amazingly.  I'm curious as to how many ELO points Stockfish or Houdini running on modern computers would be over the 3451 of Houdini 6 running on 2002 hardware? Over at the HIARCS forum there is a lot of discussion about some of the more popular computer ratings charts that seem to show the strongest at only around 3400, when a lot of us feel they are considerably stronger than that. 

Molotok89

I think the ability of super GMs to force a game into an easily drawn endgame by chosing very drawish openings (as white) is underestimated in this test, because the engines always play the best moves without such a strategy in mind. So in a realistic match against an 3500 rated computer opponent, the human 2800 GM will score much better than just 399-1.