Math People Only!: Changes to how much ratings change... - Chess Forums - Page 5

erik · 2009-10-15T15:48:58-07:00

Ok. There has been talk that ratings move up/down too much. It doesn't bother me, but I know it bugs some people. As some of you know, we use Glicko system (http://en.wikipedia.org/wiki/Glicko_rating_system and http://math.bu.edu/people/mg/research/gdescrip.pdf ). Here is what we use to start:

costelus

Oct 16, 2009

0

#81

[COMMENT DELETED]

jay

Oct 16, 2009

0

#82

So maybe the problem in my formulas is simply that I am not squaring c when calculating the rise in RD for RD'. I have:

$fltPlayer1RdPrime = sqrt(pow($fltPlayer1Rd,2) + (UserGameRating::$fltCValue * $intMinutes));

But if this formula is taken from this idea:

$\begin{displaymath} 350 = \sqrt{50^2 + c^2(30)}. \end{displaymath}$

Then perhaps my code should actually be:

$fltPlayer1RdPrime = sqrt(pow($fltPlayer1Rd,2) + (pow(UserGameRating::$fltCValue,2) * $intMinutes));

That would have the same end result as what we've been talking about, lowering the c value, since .2 squared gives us .04

extraBold, are you saying my code is correct in that I should use RD' in my calculation of their new RD, and not just their old RD?

kama-COFFEE08

Oct 16, 2009

0

#83

O.O

i should have never strayed in here..

ExtraBold

Oct 17, 2009

0

#84

jay wrote:

extraBold, are you saying my code is correct in that I should use RD' in my calculation of their new RD, and not just their old RD?

Yes. If you did't use both formulae then RD could not go up and down.

Squaring c sounds good if you are not doing that already.

IM Kacparov

Oct 17, 2009

0

#85

I think that the RD should be changing more - both going down more if you finish a game and going up more per day. Getting from 50 to 100 takes 130 days, that's 0,38 per day. On other sites (won't name them because my post will be delted or moderated) it's 2 or 3 up per day. And the minimum is about 30, while here it seems to be over 40 (look at Awardchess).

I also think it's unfair that basic members can't see their RD. It's a very important stat and should be available to everyone.

jonnyjupiter

Oct 17, 2009

0

#86

Using Awardchess as an example proves that the RD is set too high. I'll leave it to all you maths guys to figure out how to change it.

It seems that applying the same system for games with vastly different time controls might be the problem. A bullet game might take 2 minutes, but a turn-based game might take a year to complete.

When you play live chess, you play the game, your rating changes. You play 10-20 games and your rating settles to where it should be. You go off for 2 months, come back to play another game and your rating is clearly unreliable - the current system works fine for this.

However, in online chess, even though you might not have finished a game for 2 months, you could still have been making moves in many games - at 3 days per move this might only have been 10 moves for each player. Since the games were ongoing, it is likely you are playing at something close to your currently rated strength for all these ongoing games, but the RD is changing while you are playing your game(s). It seems that the premise is flawed if the reliability of my rating fluctuates while I am in the middle of the game.

As online chess is essentially a live game stretched out over a much longer time period, I think the system needs to be less granular as a result - thus longer time values with other values adjusted to compensate.

ichabod801

Oct 17, 2009

0

#87

jay wrote:

So maybe the problem in my formulas is simply that I am not squaring c when calculating the rise in RD for RD'. I have:

$fltPlayer1RdPrime = sqrt(pow($fltPlayer1Rd,2) + (UserGameRating::$fltCValue * $intMinutes));

But if this formula is taken from this idea:

Then perhaps my code should actually be:

$fltPlayer1RdPrime = sqrt(pow($fltPlayer1Rd,2) + (pow(UserGameRating::$fltCValue,2) * $intMinutes));

That would have the same end result as what we've been talking about, lowering the c value, since .2 squared gives us .04

extraBold, are you saying my code is correct in that I should use RD' in my calculation of their new RD, and not just their old RD?

Jay, I've looked over your code (your uncommented, Hungarian code). The c value should be squared, and you are not squaring it, so that is an error. Also, given that c is less than 1, that is generally going to lead to increased RDs. In your defense, that is an error on the FICS page that you referenced. You implemented their algorithm correctly, but their algorithm was wrong.

Everything else in your code looks fine to me (except maybe the Hungarian part ). You are using RD' throughout the calculation, but you should be using RD' throughout the calculation, so that's fine.

Later today I'll try to convert your code to Python so I can test it against my code. I'll test both the c and the c^2 versions of your code so that we can see what sort of an effect that has.

mathijs

Oct 17, 2009

0

#88

I've been mulling over the equations Glickmans article and I think I have some understanding of the workings of the glicko system now.

First, what are the parameters at our disposal (there are two articles used in this discussion that use different names for parameters, http://math.bu.edu/people/mg/glicko/glicko.doc/glicko.html and http://www.freechess.org/Help/HelpFiles/glicko.html. I refer to them as Glicko and Fics respectively. I've mainly used Glicko.)

-Initial rating (1200) and initial rating deviation (350). As far as I can see it doesn't matter what numbers you pick for these, it just determines (though not entirely) what will be the mode rating and how widely ratings will be spread around that rating.

-Rating period (t) and c-value. Since it appears that our ratings are updated immediately after a game finishes, the rating period is only relevant to the question of how long a period of inactivity it takes for us to be completely in the dark about a players ability. It has been suggested that five years would be that period, so I suggest that for ease of calculation that you set c=c^2=1 and therefore (assuming that an active player has RD=50) t=5 years/(350^2-50^2). Of course you can take any pair of c and t that leads to a period of five years before we're completely uncertain. That's the only effect of c and t.

-the q- and p-value (mentioned by erik in the OP, p only appears in Fics). As far as I can see q is a constant chosen for some kind of normalization. p isn't a separate value, it just 3*q^2/pi^2, that is, a constant times q squared. I'm not sure why these values exactly are chosen, but I don't think we should mess with them, at least not without consulting professor Glickman.

-The K-value (this only really appears in Fics, because in Glicko ratings are only updated once per rating period. If you allow only one game per rating period in the Glicko article (that is, if you take the rating period as sufficiently short), then his formulas simplify to the Fics case and you can isolate a K-value. Of course, in our case ther is at most one game per period, as updates are instantaneous). This is actually not a fixed parameter (as erik suggested in his OP), but it is stipulated to be no less than some number, 16 in this case. The effect of setting this minimum is that there will always be some change possible (no matter how small the RD), so as to account for the possibility of quickly improving (or deteriorating) players.

In general, I don't think the parameters matter much, they usually (unless noted otherwise) only affect what number indicates what strength (for instance, we could set them so that weak player would have rating of 2500 and strong players of 4000, that sort of thing).

Now some misunderstandings I came across reading the forum.

-When you're inactive for a period of time your rating does not decrease, but your RD increases. This means that your rating will be treated as more uncertain and new results will have a greater impact. (many people seemed confused on this matter).

-When you play players with a much lower rating, but a very high RD (e.g., players that are much stronger than their rating suggests), a loss against them will have only a small impact on your rating ( although it will have a big impact on theirs). Their high RD has the effect that your rating will drop less than it would against a player with the same rating but a smaller RD. (Valentin seems to have gotten this the other way around.)

Valentin also brings up another interesting question: when strong players lose a bunch of games on time, a win against them is only meekly rewarded. He points to one part of the problem, that ratings fluctuate during the course of a game (a problem very specific to correspondence games). I will come back to that, but first I want to look at what I think is the real issue here, that is, while the Glicko system ( and the Elo and all related systems, for that matter) are designed as a way of measuring ability, they are treated in practice as a bonus in and of itself, a sort of reward. So when a player loses a bunch of games on time, his rating plummets, although these time-outs have no implications for his actual playing strength. (I don't have an opinion on whether this is a good thing, it's just an observation). If you want to maintain both the reward system and the strength measuring features, you could use a shadow rating, in which you don't count time-outs, to calculate rating changes to the actual ratings. For instance, if a 2000 players loses some games on time, his rating might plumet to 1500, but if he then beats, for instance, an 1800 player, the effect on their rating will be that of a 2000 player beating an 1800 player. Basicly the 2000 player would get a 500 point substraction from his reward rating. The effect of this would of course be to thoroughly seperate reward ratings from strenght measurements, as players could not really recover their reward rating (without improving their playing strength).

To get back to Valentin's point: When calculating the effect of a game on ratings, you could then take the average shadow rating (and shadow RD) of both players over the game period as their ratings. (That may be quite cumbersome).

jay

Oct 17, 2009

0

#89

thanks ichabod! I'll adjust the formula to square c in RD' calculation. And yeah, excuse hungarian, but I am a big fan at this point. :) The framework I use uses hungarian, so I just wanted to be consistent.

unohoo

Oct 17, 2009

0

#90

I was looking for an answer to how many days of absence before the rating deviation goes up. I know after a two week vacation that my normal win/loss goes from 8-10 to 20-30 per game.

ichabod801

Oct 17, 2009

0

#91

@jay: it sounds from erik's first post that you are meaning to set a floor for K at 16, but I don't see where you are doing that in your code. That's something that FICS does, but isn't in the explanation on Glickman's site.

zankfrappa

Oct 17, 2009

0

#92

Yes Jay, as I mentioned back in Post #31 c should be squared, and even a
very small adjustment will probably be noticed.
I still think you may want to raise t a bit as well, in any case it will be fun
to see how the ratings are affected by the changes.

meniscus

Oct 17, 2009

0

#93

Sturtian wrote:

meniscus wrote:

lol- good suggestion, but I don't know why you went "emo" on the last sentence.

emo? You mean the "ignore me if you don't like my opinion bit"? My experience of this kind of forum is that anyone with anything approaching a reasonable idea had best be wearing asbestos underwear.

I'm shrinking the font due to the offtopic conversation. You can cut and paste it and enlarge easily in any case.

I did think it was a good suggestion, emo refers to that sentence, yes. Typically emo is a label for any statements/attitudes that construe a low self-esteem or an expectation of disappointment. In this case it looked like you expect to be ignored. I didn't ignore you, although it was in fact the sentence about being ignored that got my attention. It appears this is a good method for attention grabbing, as once again, your last sentence deserves comment/questioning. Please explain...

The sentence appears to be semi-emo as well, but I'm not sure until I understand the "asbestos underwear" analogy. I'm trying and I've come up with two quick queries.

Is abestos just another dangerous substance? Would arsenic underwear suffice--meaning that one must have breifs/boxers that are poisionous or contaminated by carcinogens in order to draw the sympathy needed for their answer to be considered?

OR is it that asbestos fortifies the undergarment in a way that stops attacks to your given answer (which is stored in said underwear) or to the groin (meaning that vicious attack is the standard response to reasonable answers)? In this case I'm not sure why asbestos is a superior substance to steel or titanium, etc, as both would protect the area without damaging the contents of the underwear.

You see the difficulty I'm having "getting" your last sentences. The first msg appeared "emo", the response to my observation of it's appearing emo appeared both emo and confusing.

Perhaps the asbestos underwear is a metaphor for some other quality that one should have before suggesting an answer in these "types of forums". Perhaps the uses of absetos include something I have yet to discover, something that is witty or clever when combined with underwear to describe the purpose of emo-appearing suggestion disclaimers.

I believe that you did 'approach a reasonable answer', and you're obviously in "these forums", so I assume you that you must in fact be 'wearing' these mythical or metaphorical garments. Sorry for the long explaination, but I'd just like to understand the cryptic analogy so that my sense of this sort of statement--asbestos and/or underwear related references--can improve.

Sincerely, Confused Responder.

[sm font size due to tangent topic]

meniscus

Oct 17, 2009

0

#94

Jay-

I'll just go ahead and add what pertains to the actual rating discussion here:

Fics is often mentioned in the Glicko conversation. Is the system exactly the same as the one used by chess.emrald.net? Perhaps their formula contains an improvement? Just a thought. In either case, I think we should go our own way. That's why I'm commenting.

It appears that more than one person has suggested either that you should create a "new" rating system. I agree. I think this wonderful website needs something unique. Why don't you? More importantly, why don't we? It is far less time consuming and demanding to the individual if we organize.

Since we're looking at changes to and hybrids of the glicko system already,
I suggest that you (in addition to this thread) start a project group dedicated to this sort of quest--a private group that math people and those who understand rating systems in general would be encouraged to join? There, the best original ideas and improvements to existing systems could be debated, tested and compared by those who wish to help. I would love to suggest ideas myself, but I'd want only "math people" there of course.

A second, more subtle, vaild reason to move the project to a group will be apparent if you read my last comment- one user has alluded that something about the forums tends to do something negative (although I don't understand what) that discourages or overlooks "reasonable suggestions". The conversation between the two of us is certainly off the topic. A more private forum would perhaps diminish this perception. Basically, he might be right: perhaps there are those who have excellent ideas and suggestions that would prefer a less-public forum? I myself made my blog "friends only" and contribute most if not all of my actual work/chess analysis to private groups only--to avoid the random comments.

I agree with others that a new or hybrid system is the way to go. However (IMO), introducing a brand new system that is completely conceived, proofed, edited and implemented by chess.com users and staff is too easy, cost-effective (free) and revolutionary an idea to pass up.

How often can we do something that is both good for chess.com and for chess itself?

That's my best suggestion at this point-- it might not assist the rating questions directly, but the members of the project will. I even have a suggestion for a name: The RRRP, or Revolutionary Rating Research Project, headed and moderated by you.It's got an easily remembered acronym--doesn't it sound similar to "AARP"? We will not try to sell elderly people insurance policies, of course.

-Tim

Chess.com members:

Who thinks this is a good idea? Would you join/contribute to this project should Jay decide to persue it?

jay

Oct 17, 2009

0

#95

The goal of a rating system is to provide an accurate estimate of someone's skill level, in this case in the game of chess. The current rating system already accomplishes this fairly well. Given two players ratings, you can come to a pretty accurate guestimate of what each person's chances are at winning the game (or drawing). There just isn't any real good reason to completely overhaul or dump this system and create our own, custom rating system. We're simply looking to make small tweaks to solve some very minor problems, that being that ratings fluctuate too far too fast, even when the player(s) involved are very active on the site. With the long list of todo's at Chess.com, it would be a dis-service to the users to spend a vast amount of time on a ratings project when we could be spending our time improving the site in other areas. Thanks for your help & suggestions! :)

mathijs

Oct 17, 2009

0

#96

Jay, I agree that there probably isn't any need for drastic "improvements". The change I described, with the shadow rating, was more to illustrate the point (valid, in my opinion) that there are two sides to rating at present. I would be quiye opposed to it, if it were seriously discussed.

But I do wonder, with respect to the tweeking, just what is "too fast too far"? Aren't those completely arbitrary measures? As far as I can see, any rating system is excellent if it always rates a better player higher than a weaker one, by how much is irrelevant. Or is this all a discussion about what the proper period for becoming completely uncertain is?

costelus

Oct 17, 2009

0

#97

Hello Jay! I would have a question, although I don't care about a rating and I definitely agree that this should be the least on your to do list. Tactics Trainer does not use the same type of rating (Glicko)? Then why, if I miss 5-6 tactical problems, my rating collapses with like 300-400 points, while in live chess it would go down only with 30-50 points after 5-6 losses?

LATITUDE

Oct 17, 2009

0

#98

[COMMENT DELETED]

jay

Oct 17, 2009

0

#99

We're working on fixing the TT ratings as well. :)

meniscus

Oct 17, 2009

0

#100

jay, i respect that. perhaps it shouldn't be a chess.com thing... anyway what about emrald.net's glicko: is it different than ours?