Most interesting thing is the bump at 2000 caused by inactive women... Inescapable conclusion, women when they get "very good " at something which after all is just a hobby ( I call 2000 very good,but will not make any money) they are satisfied and move on to something else ,the critical moment is getting to2000-a significant landmark.a personality not a cognitive difference...any other theories?
Data analysis: Difference between Male/Female ratings
I wouldn't characterize it as "very poor." Is it sub-optimal? Of course, but so is everything in an observational science. By having the experts make their determinations in isolation, the well-known false-consensus effect is avoided, so the 75% preponderance is genuine. As I mentioned in an earlier post, any assessments that turn out to be objectively wrong would tend to be rare and symmetrical, and should be expected to have a minimal impact on the results obtained over a the large number of games analyzed.
I'd actually like to see the full list of ECO codes along with the expert votes (preferably in a format that allows us to reconstruct each expert's entire questionnaire). I really doubt we'd see many (any?) that were assessed with more than one divergent vote; e.g., A=6 and S=2. I suspect that for the vast majority of openings that were assigned to either of these categories, all the dissenting votes were for "unclear." I have e-mailed the authors for this data. Almost every such query I've made in the past has been answered, so when I get a response I'll post any ECO codes that got opposite votes here (or perhaps in another thread; we've kind of hijacked this one.)
Very interesting thread ! Lot of data to look at. Are they reliable? And if so, ...
The hard bit will be how to interpret the stats and draw some reasonable conclusions. Planning on spending an evening on it next week!
@FiveofSwords: "The only thing that is quite clear" is that you've made your conclusion without reading either the paper or most of the discussion we've had about it. The "jury," if you want to call it that, was five men and three women... not all men. When I get a response to my e-mail to the authors, I'll look for patterns in the opening assessments among the experts. I doubt we'll see strong gender disparity there, but right or wrong, I'll post an update here.
I strongly recommend you take a course in statistics and probability before you make further comments about whether conclusions can be drawn from a statistical study. The math has built-in error checking... if the assessments were as wholly unreliable as you want to pretend they are, there would be no correlation at all. Yet there is an undeniable pattern of women choosing openings that somehow just "happened" to be classified as "solid" more often than men. That has to mean something, and since the researchers controlled for likely confounding variables, that something would almost necessarily be what the researchers set out to measure in the first place. Furthermore, the fact that other studies that measure risk tolerance by gender -- in areas as diverse as investment strategies and driving habits -- show the same tendencies very consistently.
Point out one specific instance of a strawman or red herring on my part. I won't hold my breath, because you won't find one.
If you majored in mathematics, you've gone out of your way to leave absolutely no evidence of that fact in anything you've posted here. I'm tempted to PM you a simple problem to test the veracity of that claim.
It's not a red herring when it's immediately followed by discussion of the relevant issue. Red herrings are intended to distract; fallacies are used to avoid discussion. This is clearly not what I was doing. To the contrary, I pointed out your false implication that the expert assessors were all male, and then proposed a way to test whether that implication really even matters.
Now, had my entire post consisted of the quote you extracted into a tortured example, that would have been a red herring. Nice try, though.
I don't have a problem with modeling risk aversion via analysis of opening choices. Almost all science is done through analysis of models. A valid question is how closely the chosen model reflects reality. Intuitively, playing a cautious opening seems risk averse to me, and a consensus of experts is a valid method to assess how cautious or aggressive an opening is. Is it perfect? No, no model ever is... a fact I already conceded. Does that make the model worthless? I would say no.
The only one I see here hand-waving about philosophy is you; I never even mentioned the word and intentionally avoided the subject after its spurious introduction. I prefer to leave such intangibles far away from anything as concrete as data analysis. Otherwise things could quickly degenerate into brain-in-a-vat solipsism.
as i prophetically mentioned in post 75, this study required a redefinition of the term 'risk aversion' away from our intuitive understanding into some artificially quantitative measure. thats the problem. ponder that for a while.
It's not a redifinition. You are using the GMs as risk detectors, because they seem to be the best thing around to judge is a strategy is risky or solid, and then you use their input to crunch the numbers.
It's not articifial what the GM's say.
but indeed you could take the argument i made and extend it further and realize that the only real judge should be the exact same people who made the move
And then you run into the problem of an unblinded study, along with all the bias that comes with it. Not to mention that you'd now have 15,000 uncontrolled, completely subjective definitions for "risk aversion." Talk about a useless metric! That would constitute a truly insane approach to behavioral science.
Most of our qualitative assessments are quantified BTW.
Nobody objects that scientists using clocks is a bad thing because it redifines our intuitive understanding of time lol.
I mean would you prefer scientist should just estimate theselves if something took 10,252 or 10,637 minute without using a clock because it would be less "artificial" ? I think not.
but indeed you could take the argument i made and extend it further and realize that the only real judge should be the exact same people who made the move
And then you run into the problem of an unblinded study, along with all the bias that comes with it. Not to mention that you'd now have 15,000 uncontrolled, completely subjective definitions for "risk aversion." Talk about a useless metric! That would constitute a truly insane approach to behavioral science.
Probaby it would be much the same anyway. I agree that the more people in the panel, the better, but you also need not to overbother yourself with the precision of the instruments.
You could also want 15.000 scales used everytime anything was weighed, but then we wouldnt be able to do much weighing :)
Most of our qualitative assessments are quantified BTW.
Nobody objects that scientists using clocks is a bad thing because it redifines our intuitive understanding of time lol.
I mean would you prefer scientist should just estimate theselves if something took 10,252 or 10,637 minute without using a clock because it would be less "artificial" ? I think not.
time is measurable.
Not the qualitative assessment. You can ask people about it, but you cannot measure it.
10 minutes can be completely differently evaluated qualitatively according to how much stuff happpens in those 10 minutes.
There are more objective measures of "solid" vs aggressive" play than to have a few masters (2000-2600 strength) say that an opening is solid or aggressive. I'm pretty sure that if 8 other experts were asked, they'd give different assessments of the openings. Heck, if you asked the same experts 5 years from now, they probably would not agree with their previous assessments.
Some more usual measures of aggressive play include analysing how quickly players tend to agree to draws, how many decisive games end before move 30, and so on.
And then there's the work of Guid and Bratko that actually tries to determine the level of complexity in play. They at least acknowledge that their work is tentative and not yet ready for broad usage.
Most of our qualitative assessments are quantified BTW.
Nobody objects that scientists using clocks is a bad thing because it redifines our intuitive understanding of time lol.
I mean would you prefer scientist should just estimate theselves if something took 10,252 or 10,637 minute without using a clock because it would be less "artificial" ? I think not.
time is measurable.
Not the qualitative assessment. You can ask people about it, but you cannot measure it.
10 minutes can be completely differently evaluated qualitatively according to how much stuff happpens in those 10 minutes.
the qualitative assessment of time isnt time. if you want to know how far a ball travels in 5 minutes then science can help you. if you want to know what you should do for the next 5 minutes because you are already very bored then dont look for answers in mathematical models.
Qualitative time is qualitative time, and quantitative time is quantitative time.
If qualitative time didnt exist, we wouldnt exist either since we'd die if we werent ably to tell for instance when "now" is.
To be able to think "I will run out this burning building now or I will die", you need to know when is "now", and the way you do this is qualitative assessment.
If you only used quantitative time, you would need first to define a t=0 on a time scale and have a device that can mesure quantitative time to find out when is t=0.
Obviously nobody would do it this way, you'd just qualitatively know that now is now and run out. A robot would just stand there and burn if his quantitative time device was broken, because it would never be now.
Here's another point: They say the experts ranged in rating from 2000 to 2600.
Let's say that 5 of the experts were rated 2000-2300 2 were rated 2300-2500, and one was rated 2600. If the five lower rated player all said an opening was "aggressive" and the three higher rated players all said the opening was "solid", who would be right?
Before you say the higher rated players are right, consider who is playing. Peers may be right for their level but not the other level. The whole notion of having experts determine universal aggressiveness based solely on the first few moves is dubious.
There are a few objective parameters of aggressivity and risk in chess. I think nobody will disagree with the fact that the Najdorf, East Indian, King's gambit and Larsen openings are more risky than the Caro-Kann or the English.
Well, I think it's trickier than you're making it out to be, though. If we take openings for example. I considered myself kind of a positional player and thought, well that means I should play the QGA or QGD as black. But actually, those positions can get wide open (pawn exchanges in the center happen very often) and thus, tactics appear all over the place, even into the endgame. So the meanings can get really tricky. Someone who is good in open positions might decide they want to play something like the QGA or QGD, despite their "solid" reputation. And for me? As long as I've played openings like this as black, they didn't really suit me because I didn't want to calculate lots of tactics in an open position; not to mention the defensive tactical skills needed to repel any attacking attempts by white. Maybe in terms of pawn structure it was "solid," but otherwise it's a very misleading description, and because of this, I found myself in the wrong opening for quite some time.
So it's tricky sometimes, to say the least. This is kind of what I mean when I say that it can be like breaking down a super complex concept like intelligence into a test. As hard as one may try, it may have serious limitations.
Ah, I see you've posted a link to the full paper in the last 5 hours. I'll peruse that before making further comments.
Made the same mistake I did just a little earlier :)
"maybe the only reason these expers think a move is aggressive is because they arent women."
But that would still point to a gender difference, right?
"Is it sub-optimal? Of course, but so is everything in an observational science."
Agreed, but all that follows from that is that we can't blame people for trying their best. It might still be that their "best" isn't very useful. There might just be significant human limitations in answering the question they want to answer in an effective way.

I admit, I misread the 6/8 as opposed to 5/8 masters agreeing. Still, that is a very poor method for determining whether a person's play is aggressive or solid.