Fritz for Fun 13 - "Weak" ELO computer ratings?

Sort:
hhnngg1

I recently got Fritz for Fun 13 as a download off Amazon.com. I've used older versions of Fritz (before the Windows 7 updates rendered it obsolete) so it wasn't so new to me.

 

I recently decided to start playing some 'rated' blitz games against Fritz, to see how my rating stacks up against a CPU rating. I set it at 60' rapid mode but played it like a 5' blitz game (for me - CPU responds near-instantly), and it's a rated game, so you can't just stop the game and do a takeback, you have to resign it or win/lose outright. 

 

I'm only a lowly 1250ish blitz player here, so I was more than a bit shocked to promptly beat Fritz set at 1850 level 3x in a row quite easily. I would guesstimate that Fritz's play level in these games against me was equal to that of a 1150 on chess.com at 5-minute blitz.

 

Am I setting the CPU up correctly? 

 

 

Here is an example game I just played - I was white, and Fritz actually resigned the position in the midgame! I'd estimate the level of play at 1150 or 1200 at best compared to a chess.com 5-min blitz player.

The time control was set to 60 minutes but I used up less than 3 minutes of my clock time to win the game.

 

 

 
 
And here's the one I played right before it, which it also resigned with less than 3 mins of my clock time used.
 
 
 
 
And I'll get out in front of the trolls and say outright - NO, I am SURE I am nowhere near 1800 in any rating system - I've played enough games to know that, so this isn't a post trolling to find out 'how awesome I am' - if I was that awesome, I wouldn't be a 1250 blitz player!
 
For reference, I can beat the chess.com android phone engine about 50% of the time if it's set at 1600 level on my phone if I'm playing it 'for real' in a 5' blitz game, and it puts up much more resistance than these two listed Fritz games. 
 
VyboR

I do not own Fritz, so I cannot help you with that. I share your opinion, it plays nowhere near 1850.

This is a common problem when you don't set chess engines at full strength. Some engines occasionally pick the 2nd or 3rd "best move" to mimic lower strength, while only the 1st move wasn't losing on the spot. Downside is also that these blunders can look inhuman, as the computer does not know when it is a hard to spot blunder (which can be human), or completely bad (e.g. first game 11...Bxf2+??)

What you could do is set the engine "at full strength", but only allow it search for x ply deep. That should at least prevent these horrible 1-move blunders.

hhnngg1

I was just surprised that I saw somewhere online a few recommendations for a program of chess improvement, to play Fritz a bunch of times to roughly get an ELO estimate for which you could benchmark against, and another suggesting that playing at 1800 Fritz level was a baseline for the program, so it seemed like some people were trusting in its consistency in play.

kleelof

Those values they give for ratings in software are for selling the software. Not as a realistic measure of your skill.

EscherehcsE

One of the reasons I'm not a big fan of Chessbase products is that they seem to not volunteer a lot of detailed information about their software. I really think they do it on purpose; I don't think it's just laziness.

Does Fritz for Fun 13 allow you to import other UCI engines? If so, maybe you could try to install a weaker engine rated somewhere around 1800. A good candidate is Roce 0.0390, which is rated 1850 on the CCRL 40/4 computer rating list:

http://www.computerchess.org.uk/ccrl/404/rating_list_all.html

Roce is a nice, stable engine. (You have to be careful when choosing weaker engines; Many are buggy in various ways.)

http://www.rocechess.ch/rocee.html

 

Another option would be to try another strong engine that can be dumbed down. You've already discovered the problem to watch out for in this case - The claimed elo level of virtually every dumbed-down engine is just an estimate, or even a wild guess in many cases. The claimed elo level can be a little off, or it can be a lot off.

A few nice "dumb-downable" UCI engines that probably aren't TOO far off with their elo estimates are Rodent 1.7, Rhetoric 1.4.1, and Ufim 8.02. (If you install Rodent, you must make a change to the rodent.ini file with a text editor; Change "user normal" to "user power". (Without the quotes.))

http://www.pkoziol.cal24.pl/rodent/rodent_download.htm

(Warning: The Rhetoric download page is in Spanish):

http://www.chessrhetoric.com/index.php/downloads/viewcategory/1-rhetoric

http://wbec-ridderkerk.nl/html/details1/Ufim.html

(P.S. - Oh yeah, when configuring Ufim, make sure you check the "Delay on Weak Levels" option.)

hhnngg1

I'd rather not install separate engines. 

 

Honestly, I'm surprised at the notion that Fritz-tuned ratings should be hundreds of points off reality - it was (might still be) after all THE flagship analysis/play chess program for serious chess players for decades, so it's surprising to me that it should be set up in a way that Fritz set at 1800 ELO plays at 1100.  

 

Which is why I'm wondering if I've got the software set up wrong (I couldn't find any settings that might be left on to further dumb down its' performance past the ELO slider.)

 

Surely other folks around here have played Fritz themselves and can comment on how strong they feel it is, for example, if they set it at their own ELO level and played it?

EscherehcsE
hhnngg1 wrote:

I'd rather not install separate engines. 

 

Honestly, I'm surprised at the notion that Fritz-tuned ratings should be hundreds of points off reality - it was (might still be) after all THE flagship analysis/play chess program for serious chess players for decades, so it's surprising to me that it should be set up in a way that Fritz set at 1800 ELO plays at 1100.  

 

Which is why I'm wondering if I've got the software set up wrong (I couldn't find any settings that might be left on to further dumb down its' performance past the ELO slider.)

 

Surely other folks around here have played Fritz themselves and can comment on how strong they feel it is, for example, if they set it at their own ELO level and played it?

It takes a LOT of testing to accurately calibrate an elo curve for an engine...which is probably why nobody wants to do it.

I tried to find some details of the Fritz 13 SE engine used in that package...couldn't find any info on it. Typical of Chessbase.

You might want to check the engine configuration settings to make sure you have a decently sized hash table. If it's set to 1 MB, it might drastically weaken the engine even more.

Other than that, I don't know what to suggest.

TrumanB

If you want real estimation of your strength I would recommend Lucas chess with Stockfish 6 engine. It's free and pretty austere with ratings.

ipcress12

According to the ad promo at Chessbase:

Don’t worry though – Fritz isn’t just a checkmating machine. The program automatically adjusts its play to suit any rating, while the built-in chess coach function explains moves and positions, gives tips, points out hidden dangers and also provides detailed opening statistics.

http://shop.chessbase.com/en/products/fritz_for_fun_13_english

27.c3 is a flat blunder that I doubt any program of the last 20 years would have made.

It says "Fritz for Fun." It's no fun losing over and over again to a machine.

EscherehcsE
ipcress12 wrote:

According to the ad promo at Chessbase:

Don’t worry though – Fritz isn’t just a checkmating machine. The program automatically adjusts its play to suit any rating, while the built-in chess coach function explains moves and positions, gives tips, points out hidden dangers and also provides detailed opening statistics.

http://shop.chessbase.com/en/products/fritz_for_fun_13_english

27.c3 is a flat blunder that I doubt any program of the last 20 years would have made.

It says "Fritz for Fun." It's no fun losing over and over again to a machine.

Hi ipcress12

Yeah, I saw the promo material. If this GUI is like the other Fritz GUIs, I think the automatic rating adjustment is part of the "friend mode". It's great if you want to find a level that can give you a fairly even game. Unfortunately, I don't think it gives you an elo level in that mode; I think it just adjusts a handicap level, expressed in centipawns.

Yeah, I agree that it's no fun losing over and over again to a machine. I guess what the OP is complaining about is that the program is providing him with too much fun. Smile

hhnngg1

I don't have it in "friend" mode, at least what I think.

 

When I hit "rated" game, there is an ELO slider where you can adjust it from noob 200 to GM3000+ strength, and I placed it at 1800s.

 

For "unrated" games, and "tactical training" games, it will adjuts its playing strength throughout the game to your level, and in the tactical mode, it will intentionally set up tactical kills for you (that you have to catch.)

 

For sure, if I play a "rated" game with the Fritz slider set at 3000+ ELO, it plays like a supercomputer - it doesn't dumb down to my lowly level then, which is why I was surprised it was so beatable at set 1800. 

 

In terms of the "blunder", we all know that at handicapped non-max ELO levels, every engine has to intentionally blunder to make itself playable at non super-GM strength; the 27.c3 move was indeed a horrible blunder, but it's not any worse then the truly horrific ones that handicapped engines make - often they'll play like a true GM for 10 moves, then hang a N outright for zero compensation, then go back to GM strength, which is the big critique of nonhumanlike engine play. 

 

Despite this problem, I still learn more from playing an engine set to slightly stronger than I am rated, then playing slightly weaker chess.com humans repeatedly - as you probably all know, your odds of playing slightly stronger opponents a majority of the time here is zero. I get stronger  rated players about 1/4 of the time only - seems a lot folks set their filters to screen out lower-rated opponents.

kleelof
hhnngg1 wrote:

Honestly, I'm surprised at the notion that Fritz-tuned ratings should be hundreds of points off reality - 

Again, ratings from whatever source you are getting them from are ONLY GOOD FOR THAT POOL.

You should not take those 'ratings' given by software as anything other than aribitrary and useless.

hhnngg1

While I somewhat agree, it would still be surprising to me to find that such a well-regarded program such as Fritz would be off by over 200 rating points compared to a UCSF elo rating, given that Fritz has been one of the top software standards for UCSF competitive chess player analysis software and well regarded by chessplayers. 

ipcress12

Again, ratings from whatever source you are getting them from are ONLY GOOD FOR THAT POOL.

You should not take those 'ratings' given by software as anything other than aribitrary and useless.

Strictly speaking yes; in practice no.

Even though it is mathematically true that a rating system is only valid for its pool, the fact is that humans set up rating systems and tweak them, to be in accord with community conceptions of how well a 1600 player, for example, performs.

You can find teachers like Heisman and Sillman explaining their views on the developmental stages of chess players based on their USCF rating.

And if rating numbers seem to be out of whack with those conceptions, the guys behind the rating system will tweak the rating system, rather than issue notices that the rating numbers are arbitrary and players should have no expectations that their USCF, FIDE or chess.com ratings be in line.

I don't know well Fritz does in estimating a user's rating, but I'll bet the software for that gets better over time, and the day is coming when players can get a decent, though rough, idea of where they stand, whether they play official rated games or not.

kleelof
hhnngg1 wrote:

While I somewhat agree, it would still be surprising to me to find that such a well-regarded program such as Fritz would be off by over 200 rating points compared to a UCSF elo rating, given that Fritz has been one of the top software standards for UCSF competitive chess player analysis software and well regarded by chessplayers. 

It all has to do with pools. A rating really only means something in its pool.

For example, my Online Chess rating(multiple days per move games) rating is 16??. Howerver, there is no way any FIDE or USCF rating I could have would be this high.

'Fritz has been one of the top software standards for UCSF competitive chess player analysis software and well regarded by chessplayers'

When playing, these engines analyze the position using the same processes it does during analysis. Of course, it does this much faster and better than you. So, to compensate, the software had to dumb it down and add an element of errorbility in the engines play.

The problem with this, is that its 'errors' are random. Imagine if you were playing a game and every 6 moves or so, you had to let your 6 year old nephew move a piece. You would have the option of limiting it to 3 candidate moves. Smile

This is what engines do. They randomly select from a few moves. When you change the rating, the software makes adjustments to this to allow for more 'errors'.

They say chess is a game of errors. But, for computers, it is a game of perfection except when it is required to error. 

Because of this, you could never test if the way an engine is being adjusted for '1800' really matches FIDE or USCF ratings because it could, potentially, make an 'error' that is actually well outside the range of an error a player rated 1800 would actually make.


I'm sure the people at Fritz did some basic targeting for these numbers. But, in the end, they went with the fact that most chess players still don't realize there is no such thing as a standard for chess ratings. It all boils down to pools and other variables.

 



ipcress12

Sixty years ago no one figured chess programs would beat super GMs using refined, but brainless minimax algorithms.

Similarly, no one figured a jumped-up database program could become a Jeopardy champion.

Do you think chess programmers could write a program now which could reliably distinguish a 1200 player from a 2200 player?

I do too.

It's just a matter of time as researchers divide and conquer their way to programs which will effectively estimate chess ratings plus or minus fifty points, maybe better.

EscherehcsE
hhnngg1 wrote:

In terms of the "blunder", we all know that at handicapped non-max ELO levels, every engine has to intentionally blunder to make itself playable at non super-GM strength; the 27.c3 move was indeed a horrible blunder, but it's not any worse then the truly horrific ones that handicapped engines make - often they'll play like a true GM for 10 moves, then hang a N outright for zero compensation, then go back to GM strength, which is the big critique of nonhumanlike engine play.

A good handicapped engine, in addition to introducing randomization for the purpose of occasional blundering, will also limit the number of nodes searched. So a well-designed handicapped engine will NEVER play like a GM. (There are probably badly designed handicapped engines that do play like a GM most of the time.)

ipcress12

I also think computer programs will get better at simulating human play at various rating levels.

hhnngg1
EscherehcsE wrote:
hhnngg1 wrote:

In terms of the "blunder", we all know that at handicapped non-max ELO levels, every engine has to intentionally blunder to make itself playable at non super-GM strength; the 27.c3 move was indeed a horrible blunder, but it's not any worse then the truly horrific ones that handicapped engines make - often they'll play like a true GM for 10 moves, then hang a N outright for zero compensation, then go back to GM strength, which is the big critique of nonhumanlike engine play.

A good handicapped engine, in addition to introducing randomization for the purpose of occasional blundering, will also limit the number of nodes searched. So a well-designed handicapped engine will NEVER play like a GM. (There are probably badly designed handicapped engines that do play like a GM most of the time.)

Ok, true, but you get my drift. The computer set at '1800' will make a horrendous blunder that no human would ever make (literally will sometimes play a suicidal BxP+ sacrifice with zero compensation like in the above game I posted), and then play moves that are in the top 1-3 moves of the engine, even in subtle positions until it catches up, and then repeats blunderville again.  

 

Weirdly though, even though it's totally NOT human-type play, it's good practice to mix it up against a silicone opponent who plays unexpected styles of chess. I've also found that for the most part, silicone Fritz plays pretty consistently if you play out the game to the end , meaning I won't crush him one game and then be totally crushed the next game - the blunders on both sides sort of equal out so that it feels similar strength for the most part. 

EscherehcsE

Well OK, I hope you solve your problem. Smile