Math People Only!: Changes to how much ratings change... - Chess Forums - Page 7

erik · 2009-10-15T15:48:58-07:00

Ok. There has been talk that ratings move up/down too much. It doesn't bother me, but I know it bugs some people. As some of you know, we use Glicko system (http://en.wikipedia.org/wiki/Glicko_rating_system and http://math.bu.edu/people/mg/research/gdescrip.pdf ). Here is what we use to start:

IM Kacparov

Oct 28, 2009

0

#121

It may not wotk with 350, but should work with lower ones.

dsji

Oct 28, 2009

0

#122

http://math.bu.edu/people/mg/ratings/rating.system.pdf

this is the glicko system as implemented by the USCF.

We may want to remove section 3.1 regarding match play but I think the restrictions on playing +/400 points of current rating are doable. Since tables of expected change are available in the public domain the changes that Glicko-2 made ( a volitility index that can be used to measure confidence of the RD and expected rate of change) can be implemented easily.

set the rating period to an average of thirty games. After thirty games convert the ratings to Glicko-2 ( This can be used as a way to spot potential suspicious players for further scrutiny as they become outliers before they become outrageously high rated ). As well as allowing for the adjustment over time of ratings in a more evenhanded fashion as opposed to allowing eariler performanced to have a greater effect.

http://math.bu.edu/people/mg/glicko/glicko2.doc/example.html

this is glicko-2

IM Kacparov

Oct 29, 2009

0

#123

I don't

Baseballfan

Oct 29, 2009

0

#124

phillliesarethebest wrote:

it makes sense in sports for teams like baseball for example you have a team record and you have a winning precentage they equal well i think that should be the same for chess

The assumption with sports teams is that teams in the same league have roughly the same talent. You don't have 25 guys straight out of HS playing 25 seasoned veterans. In chess, you can have players who are at completely different levels playing eachother. If my rating is 1200, and I beat someone who is rated at 2000, certainly that means more than if I beat someone rated 800, and a rating system like this one tries to reflect those differences.

ichabod801

Nov 1, 2009

0

#125

jay wrote:

Alright, I have put the minimum K value back in, even though Mark Glickman had no idea why that was there, although fics does use it, and apparently he helped them implement their formulas. Without the min K value, people with low RDs, their ratings just dont move at all (like the computers in live chess.) I've also created a calculator you can use to test various scenarios. Please math people get in there and run some tests and let me know if the output looks correct. It certainly doesn't feel correct at times.

Hey, Jay, sorry I took so long to check this out. I did some quick tests this morning, just 6 random tests, 3 of them constrained to give close rated players with low RDs. I ran the calculations through my Python version of your code and the web calculator, and the calculations check out. Subjectively, the changes look fine to me as well, but the real test is how they play out over time.

zankfrappa

Nov 10, 2009

0

#126

     I know you all are busy, but I wanted to mention something about my Tactics
Trainer rating. I went 14-10 today and my rating still went down!!!
     I have now lost about 600 points since October 1st, down from 2295. I lose about 16 points for a miss and gain about 8 points for a correct answer.
     It seems we have gone from too drastic a change per problem to too little,
perhaps there is a happy medium.
     Thank you.

Rookbuster

Nov 10, 2009

0

#127

This might sound like a dumb question, but if your Rd is lower does that mean that you are actually closer to your rating that is shown? For example i'm currently rated 1638 with a RD of 56 that would mean my rating strenght is between 1526 and 1750 but a friend has a rating of nearly the same with a Rd of 79.

ExtraBold

Nov 16, 2009

0

#128

Rookbuster, the RD is like a standard deviation, so your "true" rating would have a 95% probability of being within twice the RD of your rating, in your case between 1526 and 1750.

Staff, going forward, is the data accessible (to you) to see whether, say, the average score against players, say 100 or 200 points higher, across a large sample of games is similar to the expected score for such games predicted by the Glicko formulae. This could be give some feedback on how good our sigma is, and be a cheap and cheerful alternative to the intensive data analysis that might be used to estimate sigma.

Atos

Nov 17, 2009

0

#129

Rookbuster wrote:

This might sound like a dumb question, but if your Rd is lower does that mean that you are actually closer to your rating that is shown? For example i'm currently rated 1638 with a RD of 56 that would mean my rating strenght is between 1526 and 1750 but a friend has a rating of nearly the same with a Rd of 79.

Your rating is what it is, it exactly 1638. The RD is supposed to indicate the degree of uncertainty about your real "playing strength." The questions that could arise here are eg. whether uncertainty can be measured with mathematical precision, whether ratings can ever be an accurate measurement of playing strength (which is also not fixed but can vary greatly from day to day) and whether incalculating factors other than ratings themselves will help to measure the playing strength more accurately.

cookie3

Mar 5, 2010

0

#130

why is the starting # for players set at 1200? in other article, it was stated that the average rating @ chess.com was just over 1300; shouldn't this then be the starting #? Also, there are a great many people who will give up on the game when their own continued growth requires work; so why not set the beginning # of games to higher amount? Instead of 10 games, maybe raise that number to say 100 games. This way, beginners dont affect exp. players as greatly, would give beginners ratings more accuracy, and, possibly lower the R.D. #. Anyways, thanks to Chess.com! All is greatly appreciated!

Muhammad333

Apr 8, 2010

0

#131

I think that http://www.chess.com/ should use ELO ratings instead of Glicko.

catholicbatman

Apr 8, 2010

0

#132

shadowc wrote:

I agree on NOT computing a rating on new members until several games are played, so if I play a new member which is bound to be 2100 rating in the future and he beats me, I'm not gonna loose 300 points at him if he's still 1200 when I play.

As for other math, I'm not specialist.

Good point there, I know what you mean losing to someone who is bound to be super good.

ExtraBold

Apr 13, 2010

0

#133

Glicko covers that. If you play a new member they have a high RD, so your rating changes only a little.

pdela

Apr 18, 2010

0

#134

Maybe for online chess rating should start in 1400.

Vance917

Apr 22, 2010

0

#135

Vance917 wrote:

Along the same lines, you might consider having the two scores determine the ratio, but the players can agree to an amount wagered. As in any gambling situation (usually for money, but here for points), the odds are given by the house, but the player can still decide how much to bet. If the odds are 2:1, then I can bet $1 to lose $1 or win $2, or I can bet $2 to lose $2 or win $4, and so on. Likewise here, a formula based on scored might say that Player A can win 3X by winning or X by drawing, and Player B can win X by winning. Now the two players can determine X, or whoever issues the challenge can do so.

Gee, that guy had a good idea. Wish I'd thought of it. OK, seriously, I take no pleasure in quoting my own post, but this was not a frivolous idea, and I had sort of hoped that it might generate at least some discussion.

mathijs

Apr 22, 2010

0

#136

Vance, it's an interesting idea, but it suspends the notion of a ratng system as a system of assessing playing strength. I know most people don't think of ratings that way anymore, but that's what they're set up to do; what they're designed for.

Vance917

Apr 22, 2010

0

#137

Thank you Mathijs, but we can already circumvent that aspect of it by playing unrated games, which may be understood to be nothing more than one extreme on the continuum that I am proposing.

mathijs

Apr 22, 2010

0

#138

The fact that unrated games can be construed as on an extreme of your suggested continuum is not really relevant. The rating system you suggest is not an assessment of playing strength, whereas the elo and glicko systems (and the like) are. The unrated games are just not part of those systems.

Vance917

Apr 22, 2010

0

#139

Maybe ...

Puchiko

Apr 22, 2010

0

#140

I remember that when playing Go on KGS, the site would have you set up your initial approximate rating. Of course, if you lied and set up a high one, you'd quickly be shot down by the high rated players who wanted to play with you, and overall, the system was quite accurate.

Starting everyone out on 1200 doesn't really help accuracy-a player of 1800 strength will take some time to get to that rating, especially in live chess, where it's hard for such people to face strong opponents.