Rating adjustments and unbalanced teams - Chess Forums

MGleason

Sep 22, 2016

0

#1

The rating system currently appears unsound. I'm not certain, because I don't know exactly what it's doing, but I recently took a big hit to my rating in a game where I was paired with a ~600 player against two players that I think were about 1100. My partner got himself killed pretty quickly, which is no surprise for a 600 player up against an 1100 player.

I'm guessing it took the fact that I was playing against someone rated below me, and scored it as a loss against someone rated below me. Thus, my rating took a pretty big hit. I would expect that if I lose to someone rated below me.

However, I didn't lose to someone rated below me. My team lost to a team with a higher average rating. Given that my partner was about ~600, we were actually pretty out-rated.

Since my win probability is determined by the strength of all four players, not just the two on my particular board, my rating adjustment should be driven by the rating of all four players, not just the two on my particular board. A 1200 and a 2000 against a 1400 and an 1800 is a roughly fair fight, and should be scored as such; a 1200 and a 2000 against two 1400s is not, and the rating adjustment should consider the 1200/2000 team to be strong favourites.

It's possible this is already being done in some fashion. However, based on the adjustments I've gotten, it seems like the rating difference on my particular board has a bigger impact than the average rating difference between the two teams.

MasterMatthew52

Sep 23, 2016

0

#2

I agree

Kasimirov

Sep 23, 2016

0

#3

I have experienced similar problems with uneven pairings. My bughouse rating is modest, around 1400. It happend more than once that i was paired with a 1600 or lower player as a partner, and we were up against two 2000+ players! A difference of more than 1000 ratingpoints between the two teams altogether.

MGleason

Sep 23, 2016

0

#4

I've also noticed this when I win a game where I'm out-rated on my board but my team has a higher average rating. My rating goes up by more than I would expect, given that my team was stronger.

nashwinzain252

Sep 23, 2016

0

#5

In my case, my chess.com rating for standard games, below 1200. Can my rating for bughouse be at that range, because now, it is at 1700.

ChessMN16

Sep 24, 2016

0

#6

Interesting...Yeah, I agree with you. Having the "team average" is a great idea.

piotr

Sep 26, 2016

0

#7

We are going to change it soon!

MGleason

Sep 26, 2016

0

#8

Great, thanks!

cwfrank

Sep 26, 2016

0

#9

piotr wrote:

We are going to change it soon!

@piotr (pee-yat) ...

In regards to chaging things soon to accurately reflect ratings adjustments, changes, etc...

When a higher-rated plater resigns a game due to lack-of-interest, it needs to hit their score the same as anyone else. (No exceptions, unless Chess.com has a rigged score system favoring higher rated players.)

If Chess.come has a rigged scoring system for "Live" (or Daily) play which specifically benefits members who are either highly rated, or otherwise paid by Chess.com to produce content -- this should be specifically disclosed. Such as in giving @Ginger_GM 45 seconds to resign or time-out a 3-minute time-control, versus 15-seconds for the average player. (Happened to me earlier today. I was matched against Ginger_GM, and, Ginger_GM refused to play, but didn't have the typical 10-or-15 seconds to time-out or resign, but sat there for about 45 seconds not playing.)

Getting fed-up with this whole: "We don't want to play, so we'll resign and not take a score-hit." Whereas, I bet if I were to resign against a much better player, I'd take a score hit. (Even if it displays "0" instead of "1" or "2".)

cwfrank

Sep 26, 2016

0

#10

MGleason wrote:

Great, thanks!

@MGleason ... @piotr clearly said that things will be changing and/or updated soon... I wouldn't bet on it too much until after whatever change.

Things are not what they seem.

Chess.com has had numerous issues over the past few months and years (since v2 and initial Android/iOS Mobile roll-outs).

There have been numerous issues with the v3 rollout... many of the Bughouse issues have been addressed. (Thankfully.)

But, given the current state of things, in addition to some significant issues with STANDARD Chess.com tournaments of late... I'm betting (good money) that the issues which have been raised, and have not yet been properly addressed, will not be properly addressed for another couple of months, if not a year.

When so many things go wrong and are observed by so many people in such short sequence... and... I'm just one player among many (others, who may have problems consuming other Chess.com services)... the odds are Chess.com has bitten-off more than it can chew and they've got some bad dev's behind the scenes, and that's why they're not saying anything or apologizing for obvious issues they've over-looked. (Or, simply been unable to test all of the new / recent development, subsequently negatively impacting other, existing development not in a frozen code-branch which should flag warnings about conflicting code commits, etc.)

Yes, I'm a programmer and know and understand how these things work and why and how these things go wrong. (Not that hard to figure out and address if you know what you're doing and paying attention.)

ChessMN16

Sep 27, 2016

0

#11

You raise an interesting point...I think chess.com is understaffed in the programming department TBH; I don't really believe the chess.com programmers are bad or anything like that.

cwfrank

Sep 27, 2016

0

#12

ChessMN16 wrote:

You raise an interesting point...I think chess.com is understaffed in the programming department TBH; I don't really believe the chess.com programmers are bad or anything like that.

I would disagree. There's a list of all Chess.com staff. I don't think they're under-staffed (in the programming department or otherwise). But, based on my experience, the more you add "features," the more you have issues, especially if you're running agile or continuous-integration and not keeping track of what changes, when, why, and what effect one change has on another stable feature. (I have [many] horror stories to tell about software development environments that... well... run by business monkeys sending e-mails from typewriters, trying to write the next listicle instead of Shakspear... all bad stuff anyway.)

I think if I had to put my finger on it (just off the top of my head), the code behind the scenes is probably a bit of a mess (not uncommon), developers running around making changes, and maybe not enough in the way of QA. That's the most common circumstance that I've run into, even in "Enterprise" environments. And "Enterprise" environments typically have a ton of unnecessary QA who don't know what they're really doing, or the code, or how changes relate after implementation and pass-off... and things just get mucked-up. That sucks. But, is most often worse when developers are directly responsible for front-end integration (lacking QA -- "Hey, test this with me!"). So, I would guess there's some QA, but... nothing "formal" to validate and verify a change affecting one area of code doesn't necessarily affect another area of code (even though some of the code-base may be common). That's my experience (and best guess), in a nut-shell, based on what I've seen around here.

Then again, businesses absolutely 100% LOVE to put out job descriptions saying: "5 years of experience" and then hire the person who has, at best, maybe a year or two under his or her belt, just because a good impression is made. (The other side of my "experience" -- cynicism and sarcasm after years of seeing the extenuating circumstances behind things that shouldn't go wrong if truly experienced people were hired.)

It's like... talking trash OTB... except... in the business world... it's talking big in the interview. (Petal to the metal, knuckles to the bone... different story than talking a good game.)

Rocky7788

Sep 29, 2016

0

#13

Yo

ChessMN16

Sep 30, 2016

0

#14

Hmm...I haven't studied the topic in as much detail as I'd like, so I can't comment either way. Interesting post.

piotr

Oct 5, 2016

0

#15

Server update: Now using average bughouse ratings for calculations!

MGleason

Oct 5, 2016

0

#16

Great, thanks!

cwfrank

Oct 5, 2016

0

#17

piotr wrote:

Server update: Now using average bughouse ratings for calculations!

When you say "average" -- can we get clarification?

Is that teams share the burden, opposed to individual PvP?

And, if teams share the burden, have there been adjustments to scheduling and arrangements of whom is matched with in RvR or RvT (R=Random, T=Team)?

If not more importantly, are you looking for any observations or feedback in particular, or just general observations (if something is noticed)?

piotr

Oct 5, 2016

0

#18

Here is an example calculation. Team rating is averaged, individual RDs are considered.

Bughouse teams:
P1 1600, RD=40 + P4 2000, RD=120
vs
P2 1650, RD=80 + P3 1800, RD=30

Bughouse game:
P1vsP2
P3vsP4

Individual calculation for win/draw/lose (PREVIOUS):

P1: +9, +1, -7
P2: +15, -2, -20
P3: +12, +4, -4
P4: +18, -18, -54

Avg calculation for win/draw/lose (CURRENT):

P1: +6, -2, -10
P2: +21, +4, -14
P3: +10, +2, -6
P4: +28, -7, -42

MGleason

Oct 6, 2016

0

#19

That looks good to me. It shows that P1+P4 is now properly considered the stronger team, unlike the old system where P2 and P4 were considered favourites even though they are on opposite teams.

It's likely that a straight average is still imperfect. For example, a couple 1100s might be favourites against a 2000 and a 500, due to the 500's tendency to commit suicide very rapidly, even though the 2000+500 team will have a higher average rating. But a straight average is definitely better than we had before, and a perfect system might be nearly impossible to construct.

Martin0

Oct 6, 2016

0

#20

I think using average is good. I definitely see the MGleason's point that it is imperfect though. Maybe some formula like this would work instead of using average rating:

Average_Rating + Rating_Difference*c

(Rating_Difference = rating difference within a team.)

The constant c would be based on some statistics on how having high rating difference within a team improves/worsen the chance of beating a team with rating difference = 0.

For example if c = -0.2 it would mean that a team with players rated 500 and 2000 is supposed to score 50% against a team with two players rated 950.

I don't think my formula is perfect either and I certainly would not like it implemented without some statistics on what value is appropriate for c, so I think the current implementation with average is good.