Some stats / NS, FFA

Sort:
Indipendenza

If I am not wrong, the main reason why the set-up changed last year was a significant unfair disadvantage for Green. If I remember correctly, under 23%. (In a fair game distribution, it would've been 25% of course).

I analysed some 2366 NS FFA games (thanks to Space for the data!), taking only (reasonably) HL games, namely where only 2500+ players were involved. I expected the green disadvantage still to be significant. But it's not the case:

Red won in 27.3% of the games, Blue 24.8%, Yellow 23.0%, Green 24.9%.

I.e. now the worst colour is Yellow! Unexpected (for me), it was not my intuitive feeling.

And it's still 19.2% better to be the best colour (Red) than the worst colour (Yellow). Not very good. Maybe some compensation method should be found (for instance, to give X points from start to B, Y and G!).

Furthermore: RY took the first two places in 23.8% of the games, BG in 23.4%, and in 52.8% of the games it was not the 2 opps who took the first 2 places. If we look deeper in detail, we can see that: if blue is first, green has 47.1% of chances to be 2nd, whereas for red it's 27.8% and for yellow it's 25.1%! I.e. despite of the change of the formula (which makes it solo for HL players), either they still tend to "reward" their opp or maybe the configuration of the board makes that. In average the 2nd place is 47.2% for the opp and 26.4% for the sides.

I shall continue this thread later, have to go for dinner.

Indipendenza

There are (of course) 24 cases how the 4 colours may be distributed. The probability in a totally fair game would be 4.2%.

What can we see? Some configurations are largely more frequent (for the reason mentioned above: opposite cooperation). The worst disbalance being between:

Red, Yellow, Blue, Green : 6.8%

Red, Green, Blue, Yellow : 2.5%.

spacebar

I think you don't consider enough the small sample size.

statements like 23.8% are misleading. you'd need to say: 23.8 +/- 2% (not sure but the error margin is very high). Should keep that in mind when trying to draw conclusions.

Fwiw i did run stats a while ago for oma, over all games, not just 2500+, and the results showed very clearly that oma is much more balanced.

57% win rate for RY in teams dropped to 52%.

and for FFA, results were relatively even. i dont remember, but the problem of old setup where RY got 4th 20% each and BG 30% each, does not exist in oma. (for FFA stats I did only look at high rated games)

spacebar

fwiw i think you should focus on who got 4th, not who won. I don't think the setup or colors are significant once the 3 player stage is reached, esp when players are high rated. for lower rated, it will depend a lot on who is in the middle, that player will take 3rd much more often.

Indipendenza

Not really; the sample is not huge, it's true (would be much better to have 10000 games for instance, as I asked happy.png, or even 50000), but it is not small. The error margin is much lower than 2% in fact.

Indipendenza
spacebar wrote:

Fwiw i did run stats a while ago for oma, over all games, not just 2500+, and the results showed very clearly that oma is much more balanced.

YES, I fully agree, it is more balanced!

But I don't think it's good to analyse all the games: I mean, only good players games are really relevant statistically, because lower is the level, closer the results are to the random walk.

Indipendenza
spacebar wrote:

i think you should focus on who got 4th, not who won.

ABSOLUTELY, that's precisely what I planned to do next. But was waited for happy.png

thenomalnoob
Indipendenza đã viết:

If I am not wrong, the main reason why the set-up changed last year was a significant unfair disadvantage for Green. If I remember correctly, under 23%. (In a fair game distribution, it would've been 25% of course).

I analysed some 2366 NS FFA games (thanks to Space for the data!), taking only (reasonably) HL games, namely where only 2500+ players were involved. I expected the green disadvantage still to be significant. But it's not the case:

Red won in 27.3% of the games, Blue 24.8%, Yellow 23.0%, Green 24.9%.

I.e. now the worst colour is Yellow! Unexpected (for me), it was not my intuitive feeling.

And it's still 19.2% better to be the best colour (Red) than the worst colour (Yellow). Not very good. Maybe some compensation method should be found (for instance, to give X points from start to B, Y and G!).

Furthermore: RY took the first two places in 23.8% of the games, BG in 23.4%, and in 52.8% of the games it was not the 2 opps who took the first 2 places. If we look deeper in detail, we can see that: if blue is first, green has 47.1% of chances to be 2nd, whereas for red it's 27.8% and for yellow it's 25.1%! I.e. despite of the change of the formula (which makes it solo for HL players), either they still tend to "reward" their opp or maybe the configuration of the board makes that. In average the 2nd place is 47.2% for the opp and 26.4% for the sides.

I shall continue this thread later, have to go for dinner.

You can see the unbalanced of the color. However, we only need the balance of players!

Each one who plays the game has the percentage of being 1 color of those is not much different. Although if you are green (worst), you can normally build a fortress to be not lost at start. I think in the matches in your data:

- Good elo in red and bad elo in green: green lost at start

- Good elo in green and bad elo in red: a long game

The more moves are made, the different of the advantage is more and more insignificant.

Normally, if bad elo person gets green, he has lower chance to win. But in the game of 4 well-playing people, it has same percentage to win in each color.

Indipendenza

NOW, as for being 4th.

Here it's much more balanced than for the 1st place:

Red 25.1%, Blue 25.7%, Yellow 24.2%, Green 24.9%. I.e. the delta between the furthest values is only 6.3% which is narrow (cf. 19.2% above, for the 1st place delta!). Basically one can't say that one colour is actually better than another, the difference is clearly within the margin of error.

Now, as for the opps.

When R is 4th, Y wins in 23.9% of the games.

When B is 4th, G wins in 26.8%.

When Y is 4th, R wins in 27.6%.

When G is 4th, B wins in 24.7%.

Without surprise, all these values are far from being 33.3% : i.e. it's definitely bad when your opp finishes 4th (which happens usually - but not always - when he is the first eliminated), but it's no secret.

Indipendenza

@thenomalnoob, no, as you can see, G is no longer the worst colour (as it was in the Old Set-up). It's Y in fact!

YES I agree with you, I am almost sure that a) the respective ELOs and b) how those are distributed around the board have a huge impact. (That's also why I've been insisting for years that we should change the formula: contary to 2p chess, here how we sit influences the outcome. Currently the formula takes into account only the average ELO, whereas I am sure that everybody will agree that to win a game where you have 2700 in front and 2400 at your left and at your right is very much easier than to win with 2700 as your left neighbour and 2400 in front and to your right!).

Indipendenza

Some more stats.

The highest rated won in 34.2% of the games, which is 36.9% more than the random. One could expect it to be even higher. Basically, contrary to 2p chess, I think that in 4p chess the levels of the HL players are closer to each other and the rating does predict the final outcome, but not accurately. The highest rated was 2nd in 24.1% and 3rd in 23.8% of the games. And he was 4th in 18.2% of the cases. He is almost twice more often 1st than 4th, but it's a no-brainer. (The sum is not exactly 100% because there were of course a few games where some players had the same rating, as I used rounded values).

AND: if you are not the highest rated, but you have the highest rated in front, you win in 22.7% of the cases... I.e. if you have the highest rated in front, you have 34.5% of chances to win, which is too much close to 33.3% to be considered really a statistical advantage. I thought it was higher.

Arjun1516

Can I see a doc or is this just random numbers. Does anyone have any proof that this is real. Seems super fake that yellow is the worst color? I feel like all of this is fake. Using rounded values with uneven to hundred percent because of that. (Why include it instead of not including it?

Indipendenza
Arjun1516 wrote:

Can I see a doc or is this just random numbers. Does anyone have any proof that this is real. Seems super fake that yellow is the worst color? I feel like all of this is fake. Using rounded values with uneven to hundred percent because of that. (Why include it instead of not including it?

Well, despite of 30 years of experience in data manipulating and Excel, I could of course make some mistake. If you send me your email via private message, I'll send you the file I use, you'll see by yourself.

As for yellow for instance (yes, as mentioned above, I was surprised as well): from 2366 NS (Oma) FFA Rapid games where 2500+ players were involved only, yellow won in 543 cases (22.950%), blue in 586 cases (24.768%), green in 590 cases (24.937%), red in 647 cases (27.346%). Spacebar was fully right to say that the sample is not large enough probably (even if not small), but even taking this into account, one can still say that for B and G it's more or less fair (very close to 25%), but it's significantly better to be red than anything else, and significantly worse to be yellow than anything else.

So no, it's not fake. But I can send the file to you, you'll check yourself. There are the numbers of the games as well, I checked 3 just to be sure that my source data were accurate, and it was 100% the case, no mistake nor corrupted data.

Indipendenza

Just for fun, I wanted to see what happens if I look only into games where all the players were 2700+. Of course the results become insignificant as the sample is VERY SMALL in this case (only 44 games!). But as for the winner, it was: blue 8, green 10, red 15, yellow 11.

The ONLY statistically relevant conclusion that we can make here is that for these elite players, definitely to be RY is a huge advantage (one of RY won in 59% of cases, whereas BG only in 41% of cases, but it's not a surprise for anybody).

Indipendenza

Something else. I wanted to check what was the impact of the average rating of RY and BG on the result.

In 58.2% of the games (it's of course significant) the winner (whatever the colour) came from the "team" with the highest average rating.

B won in 62.1% of such games, G in 55.9% of such games, R in 59.0% of such games and Y in 55.4% of such games.

In other words, when RY (already somewhat privileged) were also presumably the strongest "team", which was the case in 1165 games, one of them won in 683 cases (hence 58.6%). When the strongest "team" was BG (1201 games), one of them won in 694 cases (hence 57.8%).

Therefore, as these values are very close and the sample clearly large enough, we can conclude that the average force has the biggest impact and the fact of being RY does not favorise them and the game is balanced between the "teams" (but: it's much better to be red than yellow, as shown above!).

(I am almost sure that to see into all games, with no threshold - here 2500+ - would influence heavily these precise findings because only for HL players the opp cooperation is automatic).

Indipendenza

Yes, surprising, but it's the fact!

(At least for games 2500+ as I didn't look in LL games, I'm only interested in what happens for HL players).

LosChess

Who cares about Stats, or "balance", New Standard is a terrible setup.

Are more games being played vs Before the merge? How many good players are gone now thanks to the New Setup?

How often does the server crash vs pre merge?

Indipendenza

Yes, the merge (and the concomitance of several major changes in the same time) was a terrible thing. I fully agree. But still, it's off-topic IMHO.

LosChess
Indipendenza wrote:

Yes, the merge (and the concomitance of several major changes in the same time) was a terrible thing. I fully agree. But still, it's off-topic IMHO.

If the Community had a voice, we would have a different Standard setup. If I wanted to play a balanced repetitive game, I would play Checkers.

Indipendenza

I don't like the NS neither (too low variety of openings). And I hope it will be reconsidered. But it is clear that NS is better balanced than OS.

I wonder what the balance in BY is.

(My general proposal remains the same: a RANDOM set-up out of the 16 possibilities, every time... That would make the game funnier and diverse and more fair and less predictable).