Articles
Which AI Is Better At Chess, ChatGPT Or Gemini?

Which AI Is Better At Chess, ChatGPT Or Gemini?

NathanielGreen
| 107 | Amazing Games

Last time we talked chess with ChatGPT, it had all sorts of weird ideas and terrible advice about chess. So, of course we left it at that, right?

Au contraire. We doubled down—literally. We added another AI, Google's Gemini, into the mix and had the two of them duke it out in a battle of the "wits" to see which model is better at chess.

So who won? Read on to find out!

The Game

What better way to test out the AIs than have them play a game against each other? First, here is the result by itself with no commentary. How many hilarious mistakes can you find?

And now here is the game with our favorite moments, including statements made by the AIs themselves, and some of the funnier backstories behind a few of the moves.

The "brilliant" move by ChatGPT which led Gemini to believe it had no legal replies, forcing "resignation".

You can play around with the game yourself at this analysis page.

The Analysis

What did Game Review have to say about all this in terms of accuracy?

And so not just the result, but the accuracy paints the picture: GPT is better at chess.

Here's what percentages those moves break into (book moves removed).

Move ChatGPT Gemini
Brilliant 0% 0%
Great 0% 0%
Best 11% 11%
Excellent 18% 14%
Good 14% 14%
Inaccuracy 11% 14%
Mistake 4% 29%
Miss 39% 7%
Blunder 4% 11%
A particularly exciting portion of the game from moves 17-23.

Gemini's mistakes and ChatGPT's misses tell a lot of the story. One AI kept giving the other one opportunities, the other kept refusing those gifts. The good news for ChatGPT is that it made more "Good" or better moves than Gemini made mistakes and blunders. The bad news for Gemini is, well, almost everything that happened.

With White's 21.Ra1 entering the Poisoned Pawn variation, Black has a few options. The most important thing is to not be tempted to take the poisoned pawn on b5.

 - Gemini, with no enemy pawn on b5, just before hanging a bishop

The Method

ChatGPT was asked for a move to start a chess game. Gemini was asked for a response. After that, moves were asked for in the following formulation: "White/Black replied [1...c5, 2.Nf3, etc.]. Play White/Black's nth move." They took place in a single conversation thread each, so the AI could remember the game so far.

To be fair to the AIs, this method effectively has them playing blindfold chess. But to be fair to us, they should be able to recreate the position much more easily than a human. 

An empty board, or a game of blindfold chess?

Should be. But the model is based on language, which makes it hard to translate text into a geometrical position. However, this problem persists when they get the whole game at once, and we know from the last article that ChatGPT can't even recreate a position from the FEN itself.

Another problem: These AIs are both designed to equivocate, which works well if a user asks them a deep philosophical question, but less well if you just want it to play a darned chess move. If an AI listed multiple moves without recommending one, it was asked for a recommendation. If it recommended multiple moves, it was asked to pick one.

Here's the fun part: what about illegal moves, or moves that don't even exist, where it tries to make a capture with a piece it doesn't have on the board? With board vision this bad, both AIs tried to make a lot of both types. If one did, it was told the move was illegal, and to pick another one. If it made illegal moves three straight times, it was given every move of the full game, which would usually finally trigger a legal move, albeit still not a good one.

On its 17th move, Gemini tried this whole mess (plus Nd7... twice) before finding a legal move (17...Rfe8). This was the first time during the game that illegal moves appeared, but it wasn't the last.

The final illegal move tally was Gemini 32, ChatGPT 6. That makes sense; it would have been crazy if the AI good enough to win was also bad enough to make more illegal moves. But it also means Gemini only went 50% in picking a legal move, while ChatGPT was over 80%.

Conclusion

So that is what happens when two language learning models try to play chess. Do any of the results really surprise you? As the winner of this game, should we try ChatGPT vs. other, actual chess bots? Do you think it could beat Martin? Or, how quickly do you think Stockfish would win? 

All we know for now is that you would not want to bet your life on ChatGPT finding a hanging queen. But if you had to choose between ChatGPT and Gemini, you know which one to pick.

Feel free to repeat this exercise yourself and share the results in the comments!

NathanielGreen
Nathaniel Green

Nathaniel Green is a staff writer for Chess.com who writes articles, player biographies, Titled Tuesday reports, video scripts, and more. He has been playing chess for about 30 years and resides near Washington, DC, USA.

More from NathanielGreen
Becoming The Next Great "Internet's Chess Teacher"

Becoming The Next Great "Internet's Chess Teacher"

Gold Medalist Shares His Top Coaching Game

Gold Medalist Shares His Top Coaching Game