Upgrade to Chess.com Premium!

Can a GM and Rybka beat Stockfish?

  • GM DanielNaroditsky
  • | Aug 8, 2014
  • | 39042 views
  • | 80 comments

About two months ago, Rubik's Cube extraordinaire and chess aficionado Tyson Mao approached me with a fascinating proposal. Together with his friend Jesse Levinson, Tyson was exploring the current state of computer chess engines.

It is well established that the days of human-computer rivalry are long gone. In a four- or five-game match, even Magnus Carlsen will stand no chance against Houdini or Stockfish running on decent hardware. 

But Jesse and Tyson began to wonder if a powerful engine would find it as easy to vanquish a human and a weaker computer working in tandem.

In other words, will a human grandmaster be able to make up the difference between two engines of varying strength?

When Tyson wrote to me in May, he had the experiment planned out: I would play a four-game match against Stockfish 5 (currently rated 3290, 13 points above Houdini 4) using the 2008 version of Rybka (rated approximately 3050).

The time control would be 45 minutes for the entire game, with a 30-second increment from move one.

My interest was instantly piqued: I was quite pessimistic about my chances of winning the match, but I was practically sure that Rybka and I would be tough to crack. Furthermore, I've always wondered if there are certain types of positions in which humans can still outfox their silicon fiends.

Was my confidence misplaced? Could a GM working together with an old version of Rybka beat Stockfish 5? Read on to find out!


phpqEP0oa.jpeg

Man or Machine? by Nathan Rupert

Game 1: 

I chose White for the first game -- since Stockfish had no opening book installed, I figured that I would have a significant advantage in the opening. Furthermore, I had the option of analyzing the position on my computer (i.e. moving the pieces around) at any time during the game, which would (in theory) allow me to counterbalance the disparity in tactical vision and calculation speed.

However, by move 15, I began to understand just how mistaken I was on practically every count. 

A frustrating start to the match!

I was particularly amazed by Stockfish's positional understanding (13...Rc8, and 15...c4 were especially noteworthy), and rather chagrined at Rybka's tactical myopia -- it hugely underestimated 15...c4, entirely overlooked the kingside pawn storm (which I actually saw coming), and could not comprehend Stockfish's tactical wizardry at the end of the game.

Nevertheless, much of the blame fell on my shoulders -- after all, I was the one making the moves! 

Game 2: 

I learned my lesson the hard way: I had no chance of surviving in a tactically complex position, and sacrifices were simply out of the question. With that comforting thought in mind, I started the clock. 

And this, folks, is why computers are no longer playable! It was not any individual move, but rather Stockfish's endgame play as a whole, that made an indelible impression on me.

Even Bobby Fischer would have probably acquiesced to a draw by move 40, but the silicon monster truly made something out of nothing. 

Game 3:

After re-energizing with a burrito (I must confess that I barely resisted the temptation to hurl it at the computer), I sat back down at the board for the second half of the match. By now, I was firmly convinced of my opponent's total infallibility, but I was determined to at least die standing. 

When Stockfish blitzed out 15...f6, tearing open the center and the seizing the initiative, I realized that my days were numbered. Instead of calmly drying the position out with a London System or King's Indian Attack, I managed to choose an opening that played right into Stockfish's hands. Mea culpa, Rybka!

Game 4: 

Any thoughts of winning the game with Black were entirely out of the window at this point, but losing 4-0 was not something I was particularly keen on either. As it turns out, determination is a powerful force indeed.

Not a particularly eventful game, but at least it was a consolation goal of sorts. (Indeed, the Germany-Brazil thrashing in the World Cup bears quite a resemblance to this match.) Although I cannot say that I am fully satisfied with our play in the first three games, the match was an unforgettable experience and -- to put it simply -- I had a lot of fun! 

Finally, I would like to thank Tyson, Jesse, and Mr. Levinson for putting on a world-class event. The free sandwiches, the live broadcast on Chess.com, and the flawless computer-and-board setup were all indications of the impeccable organization, and more than 30 people came to watch the games live.

And now, back to human chess! 


RELATED STUDY MATERIAL

Comments


  • 6 weeks ago

    yxwl

    [COMMENT DELETED]
  • 6 weeks ago

    edwardchess2

    I had the pleasure of watching this match in person on the beautiful Crystal Springs Uplands school campus in Hillsborough, CA. It was a very well administered event that included food and drink for the spectators. What I found most remarkable was Daniel's graciousness and care in answering all questions from the spectators after each game. Although mauled by Stockfish in the first three games, he showed absolutely no pique, anger or shame that the vast majority of GM's would have in the same circumstance. He did not blame Rybka's lack of tactical vision (compared to Stockfish). If I can be allowed to speculate, I believe Daniel's main reaction was admiration for how strong Stockfish played and that some of its moves were so deep they were beyond human GM comprehension (during the time the game was being played).

    ---Edward D.

  • 7 weeks ago

    R0yalGuard

    GM Carlsen + Rybka can defeat Stockfish 5. 

  • 7 weeks ago

    MinimusMax

    Well, referring to Critter as a "mediocre" engine is a bit misleading, seeing it's still in the top 5 engines, behind only SF, H, K, and maybe the latest version of Gull.

    For example here: http://www.computerchess.org.uk/ccrl/4040/rating_list_pure_all.html

    That would be like calling Grischuk, Topalov, or Nakamura "mediocre" GMs :)

    A human and a truly "mediocre" engine stand no chance in a match against SF 5 at OTB time controls. It simply sees too much tactically, so the mediocre engine, while stronger tactically than the human, will still miss many of SF's deep combinative shots.

    At the same CCRL link posted above, just click on SF 5 and look at its results against engines like Hiarcs (or even Critter!); it's not pretty.

    In fact, an 8 game match between SF 5 and Rybka 3 was played about a month ago, with Rybka 3 getting pawn and move odds at G/20 with a 10 second increment (each game a different pawn was removed, a through h).

    Rybka 3 lost +0 =5 -3, which is pretty rough for an engine that good at pawn and move odds, and a good indication of just how ridiculously strong SF 5 is.

    Sure, SF 5 isn't perfect. As pfren said, you can still find positions engines misplay or misevaluate in terrible fashion (there are, after all, still decisive results in games between engines), but there aren't nearly so many of them. 

    That DOES show that engines can in principle be beaten, at least in a game here or a game there. However, inferring from the mere existence of such positions that a mediocre engine with a human could crush an unassisted SF 5 is a bit tenuous.

    I know there are some strong correspondence GMs who don't think there's a lot for the human to add these days; for example, Uri Blass has been pretty vocal about this over at rybkaforum.net.

    At the very least, if you think those positions are indicative of some sort of weakness, you should see all the positions those human-things misplay and misevaluate; there are far more of those :)

  • 7 weeks ago

    tpe09222012

    @pfren

    Very interesting. Thanks for sharing your experiences and adding to the discussion in a substantive way

  • 7 weeks ago

    IM pfren

    @ IM pfren: An interesting argument, but would you care to delineate this method of using a mediocre engine to crush Stockfish? I'm sure many people besides me would like to know.


    Dear GM Naroditsky,

    I am relatively a newcomer to computer-assisted correspondence chess. Of course I do use an engine (my preference is still Critter, about on par, strengthwise, with Rybka 3, and I operate it on low-end hardware, but I also use Stockfish for tactical proofing- just because Stockfish is unable to play "human" chess, but it clearly is a tactical monster). I have a rating around 2400 (IM level for correspondence), and leaving aside wins where the opponent has not done his homework at the opening stage (which is not unusual, even for players rated around 2500 or so) my method is really simple: Get a playable, close to equal middlegame, and head straight for the endgame, where most engines play surprisingly poorly. THis simple method has earned me a lot of points, and I'm pretty sure a stronger player than I am (like you, for instance) would have less trouble winning "equal" games. But of course modern correspondence means that the opponents have PLENTY of time to figure out the subtleties of a position- while I assume that your experiment was using regular OTB time controls- so we are probably comparing apples with oranges.

    So, to sum it up:

    Form a reasonable plan, let the engine analyse, and then forward the position several moves ahead, to a position you have envisioned, and which the engine could not possibly consider, due to the horizon effect. Make something sensilble out of the engines output (because the engine's evaluation is nonsensical: I have met positions where an engine says "equal" and one side has a decisive advantage, or others where the engine gives something like +3.70, and factly the position is a dead draw). You can olny figure out HOW to use a chess engine properly, only if you play a great deal of modern correspondence games.

  • 7 weeks ago

    nikhil200029

    awesome, please make more of these!

  • 7 weeks ago

    mcris

    About this I recommend Opera Game (Wikipedia). Guess who won?

  • 7 weeks ago

    IM DanielRensch

    You underestimate Magnus Ben!

    Of course you're right Wink though.

    Anyway, we will settle this in our next Pardon Our Blunders!

  • 7 weeks ago

    mkkuhner

    I really liked this article.

    It seems to me that you spent some time considering Stockfish's probable weaknesses and thinking about how to play into them.  However, the human+Rybka combo also has some characteristic weaknesses.  I wonder if there is anything that could be done to bolster these.  I see four weaknesses in particular:

    (1)  The human gets tired.  You mentioned this at several points in your commentary.  I have the impression that the games were played nearly back-to-back and this probably handicapped your team.  If there's a rematch you'd probably fare better at game-per-day.  There's also probably an optimal time control for maximizing the human's usefulness, and I don't think we know what it is yet.

    (2)  Human and Rybka communicate poorly.  In particular Rybka can't explain its reasoning. There's an interface challenge here for programmers:  how can more of the engine's reasoning be brought to the surface?  Rybka's blunder-catching powers aren't fully usable because you have to overrule it in order to play better than it does, which means you can still blunder if you don't know why it hates the moves it hates....

    (3)  Collaborative chess in general is hard; my understanding is that two strong humans don't reliably beat one strong human, and can actually be inferior--distraction? style clash?  It's not a common style of play; if it became more common I bet we'd learn techniques for improving collaboration.  (Could be quite interesting, too!)

    (4)  In your commentary, you definitely sound as though you were suffering from the psychological impact of your opponent's rating and prior results: this led to the locking-up of game 4 when you might have had winning chances.  This happens constantly in human-vs-human as well, though (I suspect that if you were facing a 4-game match vs. Carlsen you'd feel a bit intimidated) and I doubt there are easily available solutions--the cyborg still has feelings and if you stifle those you're likely to be left with just Rybka.

    Anyway, thanks again for the really interesting experiment!  As both a chessplayer and a computer scientist I'm fascinated by this stuff.

  • 7 weeks ago

    chessmaster102

    @Elubas Didn't mean to sound mean but I questioning his mindset on how he approached it, not his play which was still fairly solid despite the losses

  • 7 weeks ago

    Elubas

    "Nice article and not to be rude but when you said sacrifices were out of the question it shows that you have close to zero experience of even watching other Human vs computer games. Other than playing closed positions one of the best ways to play a computer is to play sacrificially this is so when the computer has no opening book which you state stockfish didn't. The reason being computers still value material over positional considerations so gambits,positional exchange etc...tend to work quite well against them. You also said you would avoid tactical complex positions which is a little insane considering you had rybka to back up your analysis plus your own intuition backed by your experience. Honestly your whole mindset was a bit cowardly. You could have been closer to beating stockfish than you think."

    It is such a huge jump to think that a grandmaster didn't consider any of these points you made, times ten. The strength of the strong engine was very clearly demonstrated here. You saw for yourself how Naroditsky struggled through trying to come up with plans, which is supposed to be the foolproof way of winning when backed with a computer to blundercheck.

    You're talking about so many things in the abstract, which Naroditsky doubtlessly understood, and much better than you, even though right in front of us we have the empirical results of these ideas plain for us to see. I don't see how it's so hard to believe that A: things don't always go according to plan and B: Yes, the strain of fighting a computer will cause mistakes for a human.

    Even when the game just got a tiny bit open in that steinitz french, even though there was basically just a half open c file and half open d file, everything else being locked up... even then, the tactical possibilities for stockfish were overwhelming to handle practically speaking.

  • 7 weeks ago

    DATACOMMANDER

    "If Deep Blue was able to defeat the strongest chess player in the world more than 15 years ago (1997), and engines have been getting much stronger with every passing year, then there needs to be some way to translate this progression into human terms (i.e. rating)."

    Humans also improve over time. I'm sure that the difference in strength between Stockfish and Deep Blue--ignoring hardware considerations--is greater than the difference in strength between Carlsen and Kasparov circa 1996, but that's partly because Kasparov was an exceptionally strong world champion. The difference in strength between the average top-10 human today and the average top-10 human in 1996 is significant.

  • 7 weeks ago

    Daniel_Cohen

    Since the time usage is omitted, it's hard to take this article seriously as the information is brutally incomplete.

    Unlike the author, I was not particularly impressed by Stockfish 5's endgame play; in fact, I take exception to the assertion that Q+3 minor pieces and a bunch of pawns is an endgame.  The kings were still targets and not active participants is the foremost refutation of such a conception.

    Nevertheless, keep trying.  This was an experiment poorly conducted and  described flippantly, with immaterial commentary and crucial points unarticulated, but continual improvement will eventually produce useful insights into the Man+Machine idea that Kasparov advanced in the late 1990s.

  • 7 weeks ago

    Ormiston313

    GM Naroditsky, do you believe you would have won game 4 if you hadn't played for the draw with h4? Thank you for sharing this experience with all of us. You must have known your honest commentary would invite critical and rude responses from a handful of know-it-alls who will never accomplish 2% of what you have. Keep up all the great work!

  • 7 weeks ago

    mcris

    Deep Blue was a "monster" with hundreds of cores and chess-playing cards, there is no comparison with a system of today (apart maybe from Cluster Rybka or Deep Hydra). As I posted in my blog, there is a way to translate computer ratings into human ones: by setting-up a match between a GM and an engine of equal rating.

  • 7 weeks ago

    GM DanielNaroditsky

    A few more responses to the great feedback:

    @RandomAlex: You bring up an interesting point, although I would argue that the difference between two 1800 players and a GM is larger than a GM and Rybka against Stockfish. There were indeed moments in which I played (too) quickly, especially in the second game. In that game, I didn't see how White could improve his position (and neither did Rybka) - Stockfish was ostensibly moving back and forth, so I naively followed suit without spending much time on the waiting moves. 

    @Rise_of_Nations: Thank you for the kind words of support - much appreciated! 

    @ IM pfren: An interesting argument, but would you care to delineate this method of using a mediocre engine to crush Stockfish? I'm sure many people besides me would like to know. 

    @DATACOMMANDER: The question of assigning ELO ratings to chess computers is indeed one that has engendered controversy. You are right in saying that there is no conceivable way (short of involving computers in regular tournaments) to ensure that their rating is accurate. However, I think that the point of this value (3400) is twofold: 

    1. To highlight the marked improvement in strength of Stockfish when compared to previous versions and other engines. 

    2. To highlight that its playing strength is unattainable by a human. If Deep Blue was able to defeat the strongest chess player in the world more than 15 years ago (1997), and engines have been getting much stronger with every passing year, then there needs to be some way to translate this progression into human terms (i.e. rating). Nevertheless, you bring up an astute point that needs to be taken into account. 

  • 7 weeks ago

    mcris

    @IM pfren: Your affirmation is only a vane theory with words like "strong player" and "know how". So are you a strong player? Do you know how to use a weak engine so to beat the strongest chess engine? If so, please demonstrate your theory, if not it is like you said nothing, or worse.

  • 8 weeks ago

    DATACOMMANDER

    I can't get past the constant hammering of Stockfish's "3400" rating. Elo ratings are meaningful only within a single player pool. Since the top engines and the top human players play one another very infrequently, the two player pools are almost completely segregated, and comparisons between engine ratings and human ratings are virtually meaningless.

    Of course, there is no doubt that the top engines are stronger than the top humans, but without pitting them against one another in a significant number of games it's impossible to say just how much stronger the engines are. They could wind up with (human-pool) Elo ratings of anywhere from 2800 to 3500 or so, but their computer-pool Elo ratings cannot be used to predict where in this range their human-pool Elo ratings would end up.

    Referring to a top engine as "a 3400" when context makes it clear that you're comparing that 3400 to human Elo ratings is, quite simply, nonsense.

  • 8 weeks ago

    etourneau

    That was an extremely great, entertaining, crystal clear article Laughing. I liked the clarity of the explanations. And that's coming from someone who's not even 20 years old Surprised...

     

    A few questions :

    -I wonder how, in the second game, 56...h4 was chosen... Didn't Rybka protest against this move ? Rybka may not be as strong as Stockfish, but still able to recognize a "??" blunder ? So I guess Rybka wasn't consulted here.

     

    -And also about the "no book" configuration of Stockfish... So Stockfish basically reinvented the French entirely by itself Surprised? doesn't the program have *some kind* of opening book integrated even though you don't give him an extensive, external book ?

Back to Top

Post your reply: