Intresting, This is really well done. One question however, What is happening on the dutch and french defense to the white wins when it hits 0?
How to measure the drawishness of a game

What is happening on the dutch and french defense to the white wins when it hits 0?
Thanks for the compliments, everyone.
I did three more charts today (they're quite time-consuming, altogether) to get a feel for what happens, especially for the anomalies at the end that you mentioned. Those seem to be mostly a result of low sample size, the same thing that one sees on 365chess when some of the lesser (and inferior) move choices show 100% win rate for players making those moves. For example, there might be only 3 games at the end of the statistically measured games, with no draws, and the other games earlier in that line that no longer appear had draws that had been affecting the earlier statistics. The same anomalies appeared again in some of today's charts, though the Symmetrical English behaved as one would expect, and the one sharp opening behaved like the other sharp openings. Another reason for the anomalies might be that the most popular lines tend to be those that are sharper or lesser known, in other words lines that haven't been completely analyzed yet and that aren't completely boring.
I realized that a better way to measure the increase of drawishness over time is to fit a line to the plotted curves, but that will take a lot more equations and work, so I didn't derive any more numerical results for my latest three charts. The latest charts:
(5) King's Indian Defense, Classical Variation, Orthodox Variation - notoriously sharp
(6) Four Knights Game, Spanish Variation, Double Ruy Lopez - notoriously drawish
(7) English Opening, Symmetrical Variation - notoriously drawish




You're welcome. Yesterday I found out that Excel 2007 has a (least squares?) line-fitting function, but I'm using Calc, the free spreadsheet, so I can't use that function on my computer.
Also, I decided that a great way to get rid of the statistical anomalies at the end when computing drawishness statistics (or when fitting lines) is to weight the contribution of each plotted point proportional to the starting sample size, so that the points toward the end of the time span mean proportionally less. I don't know if I'll ever get around to all that, but ideally that's what I would like to do.

@ bikeboyjames:
This study is about openings, not endgames. Of course if you can mate with a castle alone, then you can also mate with a castle and bishop.
@ Fiveofswords:
I agree. However, I believe that the fact the 365database mixes together games from rank amateurs to grandmasters is better for such a study of drawishness, since if all the players sampled are very good, then the spread of outcomes will be too small. That's like testing a lot of sharp, hardworking students on a test designed for the average student: all those elite students will score so close to 100% that it becomes too difficult to make judgments on the types of questions.
It's a nice coincidence that somebody ressurected this thread today since I'm still working on this study and I might have some more results later today.
One goal of mine in this study is to reduce those effects of noise from low sample sizes toward the end of a line of a given variation, in other words, to keep those wild fluctuations under control at the right-hand ends of the graphs, which are misleading. An obvious way to do this is to progressively reduce the effects of the numerical outcomes as the games progress, so that instead of using say an average, a *weighted* average is used. The obvious way to do this is to put a scale factor in front of each term being used in the average, a factor that starts at 1 at the start of the opening and ends at 0 at the end of the opening.
Fortunately, there already exists such a natural and obvious scale factor: the ratio of the number of games in the database at any given time t (for the opening being studied) to the original (= maximum) number of games being studied at the start of the opening. For example, for the Najdorf Sicilian opening, the basic Sicilian opening starts with 764,006 games in the database, and as each move ensues, the number of the games in the sample diminishes.
Then, on the 2nd move of this opening (ignoring 1. e4 and starting at 1...c4) in this case, we're looking at the move 2. Nf3, which has 502,659 games in the database, compared to the original 764,006 games, the ratio of which is 502,659/764,006 = 0.746 = 74.6%. By the time you get down to only 2 games in the database, this ratio (that I'm calling N(t)/N0), the ratio goes down to virtually 0 (2/764,006 = 0.0%, approximately), which means that multiplying by that ratio as a weight causes that result to be virtually ignored in the weighted average, which is what we want. Below is a chart that shows how this ratio changes for the Najdorf Sicilian line as time (= number of ply) progresses. As designed, it starts at 1 and drops to 0. I also included the first three numerical values to show where this chart is coming from.
This is just a test of my foundations for creating a weighted average. Hopefully tonight I'll be able to post the interesting stuff (the actual measurements per opening).

1...c5 | 674006 | 36.1% | 29.5% | 34.4% | 1.000 |
2. Nf3 | 502659 | 36.5% | 29.9% | 33.6% | 0.746 |
2...d6 | 203048 | 37.2% | 29.2% | 33.7% | 0.301 |
Here are the results of a little study I did today that I'd been planning to do for some months. This is an attempt to objectively measure the drawishness (or sharpness) of any given opening. I used the statistics from 365chess for four different openings:
(1) Sicilian Defense, Najdorf Variation, Opovcensky Variation - notoriously sharp
(2) Dutch Defense - notoriously sharp
(3) Petroff's Defense, Classical Attack, Jaenisch Variation - notoriously drawish
(4) French Defense, Exchange Variation - notoriously drawish
For each of the above openings I looked only at the most popular line and recorded the number of White's wins, Black's wins, and draws out to 17 moves (= 34 ply). It appears that when the region on the graph on 365chess drops below about 15%, there is no more room to include the text describing the percentage. This created a problem when recording statistics, but it also happened to make the most convenient threshold for measuring drawishness: the last ply number whose text that would fit on the graph corresponds well with conventional wisdom about degree of drawishness: the sooner the graph region hit that threshold, the more drawish the opening was.
All sound openings have the same general behavior: as the game progresses, the likelihood of a draw tends to increase toward 100% (the plotted red dots move upward) while the likelihood of either side winning tends to diminish toward 0% (the blue and yellow dots move downward). The only difference is how quickly the game becomes drawish. I still haven't figured out a favorite and elegant way to measure this difference, but these differences are fairly obvious from the graphs and numbers:
()
Sharp openings tend to have the lines for wins, losses, and draws tightly clustered for many moves, whereas drawish openings tend to have the lines spread out quickly.
()
Sharp openings tend not to hit the lower text size limit (about 15%) on the chart until about 30 ply, whereas drawish openings hit that limit within 15-20 ply: in about half the time of sharp openings.
()
Sharp openings tend to increase their draw percentage at about only 2.0% per move, whereas drawish openings tend to increase their draw percentage at about 2.5% per move. (Unfortunately, one drawish opening I chose, the Exchange French, had a statistical anomaly suddenly flare up after it became drawish, which threw off my statistics. I'm fairly sure this doesn't usually happen, and that more examples of openings would average out that anomaly.)
Statistical outcomes per opening
()
Sicilian Defense, Najdorf Variation, Opovcensky Variation
total difference in draw percentage after 34 ply = 37.2%
average change in draw percentage per ply over the first 34 ply = 1.1%
average change in draw percentage per move over the first 17 full moves = 2.19%
number of ply before the first unreadably small text for percentage of win or lose on the chart = 31
()
Dutch Defense
total difference in draw percentage after 34 ply = 33.5%
average change in draw percentage per ply over the first 34 ply = 2.0%
average change in draw percentage per move over the first 17 full moves = 1.97%
number of ply before the first unreadably small text for percentage of win or lose on the chart = 31
()
Petroff's Defense, Classical Attack, Jaenisch Variation
total difference in draw percentage after 34 ply = 42.3%
average change in draw percentage per ply over the first 34 ply = 1.24%
average change in draw percentage per move over the first 17 full moves = 2.49%
number of ply before the first unreadably small text for percentage of win or lose on the chart = 19
()
French Defense, Exchange Variation
total difference in draw percentage after 34 ply = 29.5%
average change in draw percentage per ply over the first 34 ply = 0.9%
average change in draw percentage per move over the first 17 full moves = 1.74%
number of ply before the first unreadably small text for percentage of win or lose on the chart = 14
This little study and its approach can therefore be used to clear up various debates I've seen on this site, such as which openings are sharpest or dullest, how far it is useful to memorize openings, whether 1. e4 versus 1. d4 is "better," whether there exists an opening that forces a win for White, etc.