Arena GUI and Engine ELO consistency across hardware

Sort:
Avatar of FaceCrusher

Arena makes it easy to set UCI2 engine strength by adjustable ELO. But my question, is will setting the ELO strength (say 1800) on say, Komodo or Arasan make it play the same 1800 across all computers? 

Why I wonder is if they are using some method of reducing time the engine can think to bring down engine strength, it won't be the same benchmark across different computer speeds. Now PLY depth I know is the same across every computer. It searches a certain depth then stops, whether it's a 1984 computer or a new Ryzan. But Ply is cumbersome, and doesn't work well for engine tournaments. 

I want the elo I set Komodo today on my computer to be the same strength at the same ELO on a newer computer in the future so that I have a consistent benchmark. Is this how it works? 

Avatar of EscherehcsE

I can't answer your main question, but I should add that, as far as I know, Komodo 4 was the only version that used LimitStrength, and it was buggy as he11 at low LimitStrength settings. I think that after version 4, the Komodo programmers gave up on the idea of handicapping the engine.

Avatar of FaceCrusher

Well reduced ply search is a guaranteed, immutable way to reduce playing strength. Doesn't matter if it's the mighty Stockfish or Komodo, you set the ply limit to 4, and you can probably beat it. No way around that. And that will always be the same on any computer. Nevertheless, I'm hoping to see the same consistency with the Arena ELO setting for engines that support it, across different hardware. 

Avatar of EscherehcsE

I understand what you're saying, but your original post implied that you'd set the Komodo engine to 1800 elo, which is impossible using Arena (and maybe any other GUI, too). Komodo 4 only had strength settings of 0 to 15.

Sure, you can reduce the ply search, but in no way is it directly correlated to a specific elo level.

Avatar of FaceCrusher

You're right Escher, Komodo doesn't have set ELO in Arena. There engines do: Arasan

Rybka
Shredder
Fruit
Hiarcs
Houdini

 

I've played hundreds of games with them set at 5 ply and reduced ELO, and while they may not play perfectly human, they still play very well, and approximate human play well enough in 2017 to be excellent to train against. Full strength Stockfish analyzing the games found they are pretty consistent with move strength. Maybe in the past they had to make a terrible blunder to be "weaker" but that doesn't seem to be the case today. 

I am just wondering if setting the ELO of Shredder or Houdini to 1800 will mean that they will play the same strength today, that they will in 2024 when I play them at 1800 on a computer 4x more powerful. 

Avatar of EscherehcsE
FaceCrusher wrote:

I am just wondering if setting the ELO of Shredder or Houdini to 1800 will mean that they will play the same strength today, that they will in 2024 when I play them at 1800 on a computer 4x more powerful. 

I guess you'd have to know the details of how each programmer implemented the LimitStrength algorithm for his engine. If open source, you could look at the source code. If closed source, you'd have to ask the programmer, although he might not be willing to disclose that information.

I'm not a programmer, but I'd guess that if the engine implements LimitStrength via some combination of number of nodes searched and random blunders for any elo level, then it should be independent of cpu speed.

Avatar of EscherehcsE

I guess the only problem would be if calculation time was a factor in the LimitStrength algorithm. As long as it's only number of nodes or number of plies (along with the randomness factor), I think the elo strength wouldn't change when you get a newer computer.

Avatar of FaceCrusher
EscherehcsE wrote:

I guess the only problem would be if calculation time was a factor in the LimitStrength algorithm. As long as it's only number of nodes or number of plies (along with the randomness factor), I think the elo strength wouldn't change when you get a newer computer.

 

Exactly, and now you're at the heart of it. The main program I've used to play against for over 10 years is a old program called KChess from 2004. Why? Because I KNOW how strong it is, and it's been the same strength on 10 different computers. It goes by ply.

If the ELO setting on ELO compliant engines in Arena uses a CPU independant method to reduce strength, i.e. nodes or ply, then it's perfect for me. If it uses a CPU dependant method to reduce strength i.e. time (more calculations/s can be done on faster machines in the same amount of time) then I'll have no way to dial the strength equally between hardware. Thus, no standard benchmark.

And that's what I'm hoping someone here might know.

Avatar of EscherehcsE
FaceCrusher wrote:
EscherehcsE wrote:

I guess the only problem would be if calculation time was a factor in the LimitStrength algorithm. As long as it's only number of nodes or number of plies (along with the randomness factor), I think the elo strength wouldn't change when you get a newer computer.

 

Exactly, and now you're at the heart of it. The main program I've used to play against for over 10 years is a old program called KChess from 2004. Why? Because I KNOW how strong it is, and it's been the same strength on 10 different computers. It goes by ply.

If the ELO setting on ELO compliant engines in Arena uses a CPU independant method to reduce strength, i.e. nodes or ply, then it's perfect for me. If it uses a CPU dependant method to reduce strength i.e. time (more calculations/s can be done on faster machines in the same amount of time) then I'll have no way to dial the strength equally between hardware. Thus, no standard benchmark.

And that's what I'm hoping someone here might know.

It's possible that someone here might have a solid answer to your question, but I doubt it. Most people are only interested in the maximum strength of an engine, and handicap levels seldom get discussed. You might have better luck signing up at the Talkchess.com forum; They have many more engine programmers there than Chess.com does.

 

However, the best approach might be to simply e-mail the programmers of the engines you're interested in and see what kind of answers you get.

 

The Houdini web site implies that Houdini is independent of cpu speed, although it uses the weasel word "mostly":

The strength reduction is mostly based on a combination of two techniques:

Limiting the number of positions searched - this reduces mostly the tactical strength of the engine;
Purposely picking a move the engine knows is not optimal - this reduces mostly the positional strength of the engine.

The combination of the two produces a game with both tactical and strategic (positional) flaws.

 

The HIARCS and Shredder documentation don't discuss the details of the LimitStrength algorithm. I don't think the Arasan site discusses it, either, although Arasan IS open source.

Avatar of FaceCrusher

Okay, well now that does get me closer to what I was looking for than a lot of what I've come across, thanks. Houdini might be one of the leading candidates for a main training engine then. What I have been doing is playing dozens of games with the Engines at reduced ELO against KChess at 5ply, which is always the same strength. I am coming up with some useful stats. I can then run this experiment again at work, where my computer is far, far more powerful. If it is far out of balance with my home results, I'll know the ELO is not consistent on different hardware. This is however, a cumbersome method. But the reduce strength part of the manuals is a good fine, thanks. And I will pop over to talkchess, as well as some forums dedicated to just computer chess and engines, to take a look.