Why do bots have ratings?

Sort:
Alchessblitz

In most chess programs the programmer puts an elo ranking on the different bots he creates because there are ranked games where we win or lose more or less elo points depending on the result of our matches, elo of bot and ours.

Here programmer didn't make ranked games but he made "the setup" for maybe if one day he decides to add this game mode (ranked games).

There are levels :

1) newbie : Martin to Karim

2) intermediate : Emir to Mateo

3) avanced : Antonio to Manuel

4) master : Noam to Wei

In many video games they often put 1) easy 2) normal 3) hard 4) very hard (so he does a bit of the same thing except he calls it differently) and for each bot he created he puts an elo.

Afterwards the real problem is that it wants to compare anything with anything. It wants to compare for example robot Ahmed (2200) playing in time 1s per move with human Jean-Kevin (2200) playing in time 1 hour 30 minutes for 40 moves then 30 minutes.

Tactrix
AngusByers wrote:
Tactrix wrote:
AngusByers wrote:
Tactrix wrote:
AngusByers wrote:

I think part of the issue is how the review analysis comes up with "estimated Elo" values. It factors in the Elo information of the players involved, so both yours and the bots in this case.

I wouldn't be surprised if you did the following.

Take one of your bot games and review it, get the Elo estimate for you and the bot.
Now, download the PGN for that game, open it in a text editor, and remove the tag with the bot's Elo, and upload that version. Review that.
Do that again, but this time leave the bot's Elo in and remove yours.

I bet the estimated Elo for you and the bot will be different in each of those, despite it being the same game.
I'm curious, though, as to whether or not the difference in the estimated Elo's remains fairly constant (i.e. in the above, is the winner always rated +200 over the loser?) If so, then the more important information you probably want to take from the review estimates is about how much stronger did you (or your opponent) play while taking the actual value of the Elo's with a grain of salt.
When this feature was originally released, it would provide estimated Elo values for games that had no information about the player's Elo. It was scoring my games as over 2000! Even I knew that wasn't accurate, but it was fun to see.

It doesn't rate games based on your elo vs the game. It rates them based on the overall hit rating of the game itself. If you took my elo away and magnus carlsons elo away, and we played a game I'd still rate around 1000 and he'd still rate around 2800, because it doesn't use the average games I play or he plays as a baseline, it rates the hit rating overall and compares it to the entire community of players.

Really? Do as I suggested above. Post the results. If I'm wrong and you're right, then the estimated Elo for your games will be the same whether or not you analyse them with the Elo Tages included.
Since I already know the answer to this (having done it myself), I know you're wrong, and you're posting what you think, rather than know. Might explain your chess too, come to think about it.

Can't you do that? It seems like a lot of work for you to calculate your own conclusion, if you want it, you can do it with my games, just click on my named and you'll see my games.

Bolded the bit that provided the information that you then asked? I have done it. I don't expect you to just believe me, since people can say anything, so I suggested you do it yourself as well. I got the information about how the Elo estimates are calculated from Chess.com in a thread that got into discussing them when it stopped providing estimates for games without player Elo tags.

I have no reason to think you're lying, it just struck me as strange that bots with elos can be so far off the elo they're supposedly programmed for.

IsniffGas
Tactrix wrote:
AngusByers wrote:
Tactrix wrote:
AngusByers wrote:
Tactrix wrote:
AngusByers wrote:

I think part of the issue is how the review analysis comes up with "estimated Elo" values. It factors in the Elo information of the players involved, so both yours and the bots in this case.

I wouldn't be surprised if you did the following.

Take one of your bot games and review it, get the Elo estimate for you and the bot.
Now, download the PGN for that game, open it in a text editor, and remove the tag with the bot's Elo, and upload that version. Review that.
Do that again, but this time leave the bot's Elo in and remove yours.

I bet the estimated Elo for you and the bot will be different in each of those, despite it being the same game.
I'm curious, though, as to whether or not the difference in the estimated Elo's remains fairly constant (i.e. in the above, is the winner always rated +200 over the loser?) If so, then the more important information you probably want to take from the review estimates is about how much stronger did you (or your opponent) play while taking the actual value of the Elo's with a grain of salt.
When this feature was originally released, it would provide estimated Elo values for games that had no information about the player's Elo. It was scoring my games as over 2000! Even I knew that wasn't accurate, but it was fun to see.

It doesn't rate games based on your elo vs the game. It rates them based on the overall hit rating of the game itself. If you took my elo away and magnus carlsons elo away, and we played a game I'd still rate around 1000 and he'd still rate around 2800, because it doesn't use the average games I play or he plays as a baseline, it rates the hit rating overall and compares it to the entire community of players.

Really? Do as I suggested above. Post the results. If I'm wrong and you're right, then the estimated Elo for your games will be the same whether or not you analyse them with the Elo Tages included.
Since I already know the answer to this (having done it myself), I know you're wrong, and you're posting what you think, rather than know. Might explain your chess too, come to think about it.

Can't you do that? It seems like a lot of work for you to calculate your own conclusion, if you want it, you can do it with my games, just click on my named and you'll see my games.

Bolded the bit that provided the information that you then asked? I have done it. I don't expect you to just believe me, since people can say anything, so I suggested you do it yourself as well. I got the information about how the Elo estimates are calculated from Chess.com in a thread that got into discussing them when it stopped providing estimates for games without player Elo tags.

I have no reason to think you're lying, it just struck me as strange that bots with elos can be so far off the elo they're supposedly programmed for.

because he's lying

Tactrix
Alchessblitz wrote:

In most chess programs the programmer puts an elo ranking on the different bots he creates because there are ranked games where we win or lose more or less elo points depending on the result of our matches, elo of bot and ours.

Here programmer didn't make ranked games but he made "the setup" for maybe if one day he decides to add this game mode (ranked games).

There are levels :

1) newbie : Martin to Karim

2) intermediate : Emir to Mateo

3) avanced : Antonio to Manuel

4) master : Noam to Wei

In many video games they often put 1) easy 2) normal 3) hard 4) very hard (so he does a bit of the same thing except he calls it differently) and for each bot he created he puts an elo.

Afterwards the real problem is that it wants to compare anything with anything. It wants to compare for example robot Ahmed (2200) playing in time 1s per move with human Jean-Kevin (2200) playing in time 1 hour 30 minutes for 40 moves then 30 minutes.

I always assume there will be some level of variation, but my issue is with how far that variation goes. My expectation is that it swings 200-300 above and below the elo it's supposedly playing at. But when it swings 800-1000 under it, there's something seriously wrong.

Alchessblitz

This is a game with two bots having "algorithms" different from the chess.com bot and 2200 in time 15m10s per move from Shredder is strong.

I am convinced without having been able to verify it that in the game review it will give things that seem false to us, not credible or incomprehensible when we know that it is a game of two strong bots.

Alchessblitz
I always assume there will be some level of variation, but my issue is with how far that variation goes. My expectation is that it swings 200-300 above and below the elo it's supposedly playing at. But when it swings 800-1000 under it, there's something seriously wrong.

I have the impression that you don't understand "how to make a bot". Basically and in theory, first we create the maximum level program then from the maximum level we use "a mathematical formula" to lower the level such as reducing the number of positions per second calculated or "an error handicap" based on the pawn unit.

So in short it is by default the maximum level of the program who'll play by making more or less blunders, less serious errors or (...) in relation to a given handicap by "the mathematical forumula".

Alchessblitz

No luck the game review gives 2200 for the Shredder bot (2200) but we see that there are "lots of anomalies".

a : Like many people, we have already obtained the 2200 rating and even more and that doesn't mean that bot.shredder (2200) will not be too strong for us.

b : It gives a figure of 93.9 precision while it was Hiarcs at the maximum level who played and the same we can have figures as strong and we will still have no chance against Hiarcs

Ultimately what I mean is that the data that the chess.com chess program sends "cannot be taken always like something relevant".

Tactrix
Alchessblitz wrote:
I always assume there will be some level of variation, but my issue is with how far that variation goes. My expectation is that it swings 200-300 above and below the elo it's supposedly playing at. But when it swings 800-1000 under it, there's something seriously wrong.

I have the impression that you don't understand "how to make a bot". Basically and in theory, first we create the maximum level program then from the maximum level we use "a mathematical formula" to lower the level such as reducing the number of positions per second calculated or "an error handicap" based on the pawn unit.

So in short it is by default the maximum level of the program who'll play by making more or less blunders, less serious errors or (...) in relation to a given handicap by "the mathematical forumula".

I knew and factored in everything you just said. None of that explains why bots are falling so grossly outside the norms. If you make a program, and program it with errors to take from lets say 3000 to 1500, there is no reason why that program should at any point play with enough errors to turn that game into a 500. Otherwise the person who programmed it doesn't understand how to calculate proper margins to account for playing within the 1500 variable range. Proving my point that labeling a bot any elo is pointless.
Futhermore if you think "well the computer can't account for how the players play." Yes they can, because they have the unique ability of calculating enough moves to account for any player in that elo. They at some point weren't that advanced, but right now stockfish can outright decimate any chess player including the very best ones.

Bmpz24

This might be a little off topic but i once saw a video on youtube where Magnus played vs. the strongest/newest version of stockfish which had an elo rating of 4000. It completely obliterated Magnus, he had not the slightest chance against it. It literally didn't last long, he was finished after just a few moves. It could anticipate each of magnus moves many moves ahead. Just like Magnus would obliterate a beginner. That's how huge the difference was between Magnus and that 4000 elo stockfish. And that is Magnus the world champion we're talking about who iirc is the only human who got a draw vs. chess.com's engine 25 maximum (3200). Just amazing the strength of that stockfish.

Tactrix
Bmpz24 wrote:

This might be a little off topic but i once saw a video on youtube where Magnus played vs. the strongest/newest version of stockfish which had an elo rating of 4000. It completely obliterated Magnus, he had not the slightest chance against it. It literally didn't last long, he was finished after just a few moves. Just like Magnus would obliterate a beginner. That's how huge the difference was between Magnus and that 4000 elo stockfish. And that is Magnus the world champion we're talking about who iirc is the only human who got a draw vs. chess.com's engine 25 maximum (3200). Just amazing the strength of that stockfish.

I agree, it's not even close to contest anymore, computers calculate and have far surpassed regular people. So it's literally just a matter of programming them to play poorer since they can outplay us any other way.

Alchessblitz
Tactrix a écrit :

I knew and factored in everything you just said. None of that explains why bots are falling so grossly outside the norms. If you make a program, and program it with errors to take from lets say 3000 to 1500, there is no reason why that program should at any point play with enough errors to turn that game into a 500 [...]

Not sure I understand but otherwise I want something concrete. The game I posted in this topic, what you see bot.Shredder (2200) "falling so grossly outside the norms" ?

My understanding of the game :

1) d4 Nc6 2) c4 e5 all this makes sense if Hiarcs plays 3. dxe5 Nxe5 which gives a position we I think don't really like as human because "the center is broken", the Knight centralized and the Bishop of the dark square can quickly emerge and 3. d5 leads to "into the Indian-style position" 3) d5 Nb8 in theory a human almost always plays Nce7 but Nb8 is a good move too as Shredder (2200) got what he wanted and Nce7 also hinders the dark square Bishop.

4) e4 I imposed this move for Hiarcs in order to fall "into the Indian-style position" and "better understand what is happening on the chessboard". 4)...Bb4+ it's a basic bot move because bot.Shredder (2200) sees that he can win a development tempo by chess to the King and for example 5. Nc3 Bxc3+ 6. bxc3 bot.Shredder (2200) will be happy because he wants to play a closed position and he prefers Knights to Bishops in closed positions.

5) Bd2 Bxd2+ 6) Qxd2 d6 7) Bd3 the idea is to prevent or antagonize x)... f5 for example if 7. Nc3 maybe 7...f5 7)...c5 this move is strategically logical it aims to block the 0-0-0 side then as bot.Shredder (2200) has the advantage of space on the 0-0 side it's all good for him.

8) Nc3 this move is a strategic error IMO, Hiarcs should have played 8. dxc6 and for ex. to play on the long term on the weakness of the d6 pawn 8)...Nf6 I would have preferred 8...Ne7 with the idea of preparing f5. After 8)...Nf6 I have the impression there is no longer a strategic plan, bot.Shredder (2200) is playing a bit for a draw.

I am also trying to say that it is often with this kind of timorous moves that bots play badly because they break the potential strength of the position by having at the end (in quite a few cases) "positions that are not losing but not winning either".

9) f4 this move that the Coach finds ?!.

IMO 9) f4 is super important to be able to have a chance of winning, otherwise how can we hope to win ? By opening the game with b4 or f4.

9)...exf4 the Coach says it's an excellent move but concretely Hiarcs has a better position. IMAO I'd even say it is the move 8)...Nf6 that caused bot.Shredder (2200) to lose by being refuted by 9) f4

10) Nf3 o-o 11) o-o Qe7 i.e. bot.Shredder (2200) doesn't really have a plan so by default he plays to develop his chess pieces. He could have played 11...Qb6 it would have been a bit the same for him.12) Qxf4 Coach says another move but simply 12. Rae1 and Hiarcs is sur to be best. 12)...Nh5 maybe the idea of playing an f5 13) Qh4 Hiarcs wants to exchange Queens to reduce the complexity of the game and win by giving his opponent no chance.

The rest of the game was more or less calculated.

Ultimately the critical position is this :

and it is IMO 8)...Nf6 the bad move which probably will lead bot.Shredder (2200) towards defeat and Coach says is good.

Tactrix
Alchessblitz wrote:
Tactrix a écrit :

I knew and factored in everything you just said. None of that explains why bots are falling so grossly outside the norms. If you make a program, and program it with errors to take from lets say 3000 to 1500, there is no reason why that program should at any point play with enough errors to turn that game into a 500 [...]

Not sure I understand but otherwise I want something concrete. The game I posted in this topic, what you see bot.Shredder (2200) "falling so grossly outside the norms" ?

My understanding of the game :

1) d4 Nc6 2) c4 e5 all this makes sense if Hiarcs plays 3. dxe5 Nxe5 which gives a position we I think don't really like as human because "the center is broken", the Knight centralized and the Bishop of the dark square can quickly emerge and 3. d5 leads to "into the Indian-style position" 3) d5 Nb8 in theory a human almost always plays Nce7 but Nb8 is a good move too as Shredder (2200) got what he wanted and Nce7 also hinders the dark square Bishop.

4) e4 I imposed this move for Hiarcs in order to fall "into the Indian-style position" and "better understand what is happening on the chessboard". 4)...Bb4+ it's a basic bot move because bot.Shredder (2200) sees that he can win a development tempo by chess to the King and for example 5. Nc3 Bxc3+ 6. bxc3 bot.Shredder (2200) will be happy because he wants to play a closed position and he prefers Knights to Bishops in closed positions.

5) Bd2 Bxd2+ 6) Qxd2 d6 7) Bd3 the idea is to prevent or antagonize x)... f5 for example if 7. Nc3 maybe 7...f5 7)...c5 this move is strategically logical it aims to block the 0-0-0 side then as bot.Shredder (2200) has the advantage of space on the 0-0 side it's all good for him.

8) Nc3 this move is a strategic error IMO, Hiarcs should have played 8. dxc6 and for ex. to play on the long term on the weakness of the d6 pawn 8)...Nf6 I would have preferred 8...Ne7 with the idea of preparing f5. After 8)...Nf6 I have the impression there is no longer a strategic plan, bot.Shredder (2200) is playing a bit for a draw.

I am also trying to say that it is often with this kind of timorous moves that bots play badly because they break the potential strength of the position by having at the end (in quite a few cases) "positions that are not losing but not winning either".

9) f4 this move that the Coach finds ?!.

IMO 9) f4 is super important to be able to have a chance of winning, otherwise how can we hope to win ? By opening the game with b4 or f4.

9)...exf4 the Coach says it's an excellent move but concretely Hiarcs has a better position. IMAO I'd even say it is the move 8)...Nf6 that caused bot.Shredder (2200) to lose by being refuted by 9) f4

10) Nf3 o-o 11) o-o Qe7 i.e. bot.Shredder (2200) doesn't really have a plan so by default he plays to develop his chess pieces. He could have played 11...Qb6 it would have been a bit the same for him.12) Qxf4 Coach says another move but simply 12. Rae1 and Hiarcs is sur to be best. 12)...Nh5 maybe the idea of playing an f5 13) Qh4 Hiarcs wants to exchange Queens to reduce the complexity of the game and win by giving his opponent no chance.

The rest of the game was more or less calculated.

Ultimately the critical position is this :

and it is IMO 8)...Nf6 the bad move which probably will lead bot.Shredder (2200) towards defeat and Coach says is good.

I think you think I'm talking about 1 specific game, but I'm not, I'm talking about a series of games that I've seen over playing on here. A decent chunk of them act correctly, but some of them don't. They simply act as if you're playing either a far better bot or a far worse one. Those are the ones I'm questioning.

AngusByers

Just with respect to what I posted above. The estimated Elo values given for a game include the Elos of the two players. You have to provide at least one player's Elo. Here's the analysis of one of my games against the Zari-bot. It estimates my Elo at 1300 and Zari at 700 when I include both our Elo tags.
I then removed mine and re-analysed the game. And now, because it only has Zari's Elo, it estimates me at 1500 and Zari at 800. I had though the difference was a constant, but we differ by 600 in the first analysis and by 700 in the 2nd, so even that gets a bit wobbly Also note, the feedback with respect to each phase of game also changes. In the first, it scores my middle game as ! while in the 2nd (without my Elo tag include) it is a *, one step lower. But what is "excellent" for a 800-900 Elo might only be ok if you're 1200-1500 etc.
Estimating a programs Elo based upon the code and settings is not easy, and probably not really possible. One has to obtain the bot's Elo by having it play rated games, just like everyone else. However, it appears that the way things work on Chess.com is that the engine runs on one's local device, so the same bot played on an older phone compared to when played on a new top end computer will perform differently, which means the bots "strength" is not due to the code/settings alone but also what device it gets played on. This makes estimating a bot's Elo difficult and unreliable.
When I used to play on FICS back in the 90s, there were computer accounts (flagged as such), where you could play rated games against various chess engines, but those engines were not running on the player's device so their performance only varied due to changes in the code/settings. The ChessMaster series has various "personalities", and it tries to factor in your computer's performance to estimate the different Elos for given personalities on the computer it is installed (well, the CM9000 does; I suspect the newer ones do too, but not sure about any of the earlier versions). It would be interesting if "bot accounts" could be set up so that the various personalities could obtain proper ratings, but I think there is some concern that people would "farm" the bots for Elo. I suppose bot games could be "rated for the bot" but remain unrated for players, allowing an estimate of the bot's Elo to be derived that way?
Anyway, that's a bit of an aside, and just wanted to point out that the estimate Elo calculation (and apparently the scoring of the opening/middle game/ending phases) is in part dependant upon the Elo of the players. I only have a free account, so I've used my 1 free analysis to do this today (I had the full analysis of the Zari game already), 
Here's the game, and below it the output from the "Game Review" when I include both Elo Tags (first one) and when I remove mine (second one).

Tactrix

@angusbyers I also used to play chessmaster, and the thing I loved about that game is it never had any dealing with elos it would just simply say "beginner, intermediate, expert, etc." I think they had the right idea, because it seems to be difficult to maintain a specific elo in this game if you're a computer no matter how hard you try as long as it's not the lowest or the highest one.

KieferSmith

Bot's ratings represent the approximate skill level of that bot. They aren't always accurate, as sometimes bots play much better or much worse than their rating would suggest. But they are usually close. They exist to give you an idea of how difficult that bot is; beginners should play against the beginner-level bots, and advanced players should play against the advanced-level bots. If bots didn't have ratings, you would'nt know which bot is the best for the kind of game you want. You should play against a bot that's approximately your rating, but that would be impossible if the bots didn't have ratings in the first place. Someone wanting a challenge might wind up playing against a bot with the skill level of Martin.

RobloxStudioScripter
Idk
RobloxStudioScripter
Tactrix wrote:

AngusByers wrote:

Tactrix wrote:

AngusByers wrote:

I think part of the issue is how the review analysis comes up with "estimated Elo" values. It factors in the Elo information of the players involved, so both yours and the bots in this case.
I wouldn't be surprised if you did the following.
Take one of your bot games and review it, get the Elo estimate for you and the bot.Now, download the PGN for that game, open it in a text editor, and remove the tag with the bot's Elo, and upload that version. Review that.Do that again, but this time leave the bot's Elo in and remove yours.
I bet the estimated Elo for you and the bot will be different in each of those, despite it being the same game.I'm curious, though, as to whether or not the difference in the estimated Elo's remains fairly constant (i.e. in the above, is the winner always rated +200 over the loser?) If so, then the more important information you probably want to take from the review estimates is about how much stronger did you (or your opponent) play while taking the actual value of the Elo's with a grain of salt.When this feature was originally released, it would provide estimated Elo values for games that had no information about the player's Elo. It was scoring my games as over 2000! Even I knew that wasn't accurate, but it was fun to see.


It doesn't rate games based on your elo vs the game. It rates them based on the overall hit rating of the game itself. If you took my elo away a...
Tactrix
KieferSmith wrote:

Bot's ratings represent the approximate skill level of that bot. They aren't always accurate, as sometimes bots play much better or much worse than their rating would suggest. But they are usually close. They exist to give you an idea of how difficult that bot is; beginners should play against the beginner-level bots, and advanced players should play against the advanced-level bots. If bots didn't have ratings, you would'nt know which bot is the best for the kind of game you want. You should play against a bot that's approximately your rating, but that would be impossible if the bots didn't have ratings in the first place. Someone wanting a challenge might wind up playing against a bot with the skill level of Martin.

Yea that's the general idea. It's just sometimes they're so far outside their rating that it's mindboggling.

AngusByers
Tactrix wrote:

@angusbyers I also used to play chessmaster, and the thing I loved about that game is it never had any dealing with elos it would just simply say "beginner, intermediate, expert, etc." I think they had the right idea, because it seems to be difficult to maintain a specific elo in this game if you're a computer no matter how hard you try as long as it's not the lowest or the highest one.

Yah, I think the early versions were like that (had "personalities" like Light, Moderate, Difficult, etc). I think starting with the CM6000 they introduced more personalities, and gave them estimated ratings. I know the CM9000, which I still have, gives the various personalities Elo values, and depending upon the computer specs, those values will change. So if you install CM9000 on an older and slower computer, the ratings go down and if you install it on a new machine, they go up. They tried to work out some sort of relationship between the Elo and computer specs. I don't pay too much attention to the actual values, though, and just use them to order them and then find roughly a "CM based rating" that gives me a good challenge, but where I have a chance of winning. It is a good chess engine, not as strong as they are now of course, but more than capable of subjecting one to ritual humiliation if you aren't careful (and on the higher levels, even then).
Anyway, I can't recall if CM worked out those rating values by testing the various settings against people at the time or how they came up with them. I seem to recall they did test them against FIDE rating players in some clubs, but I seem to recall a lot of things that never happened sometimes! happy.png