Remove [%eval] from PGN database

Sort:
prof_frink

Forgive me if this is in the wrong forum, but I didn't see a 'computer chess' forum, so I'm posting it here.

I've got a database of games in PGN format and some of the games contain [%eval] computer evaluations. I'd like to remove them from the database without deleting any of the other comments.

What would you say is the best way of doing so? Is there a tool/utility that will allow me to do this, or can this be done with a text editor? (The database in question is a little over 1 million games.)

skelos

I don't know of an existing tool. One million games is a lot to load into an editor; I'd probably prefer to avoid that. A few lines of perl or Python or (your-favourite-language) would do the trick.

Does [%...] ever split over two lines?

What operating system(s) do you have this data on? (That changes what tools will be installed by default. A natural Linux/Unix solution will work on OS X/MacOS, but need non-standard (but available) additional tools for Windows. How it might be done natively on Windows without additional software ... I'm the wrong person to answer.

prof_frink

Hi skelos,

 

To the best of my knowledge, it doesn't split over two lines.

 

I'm using Windows. (I've really gotta familiarize myself with Linux one of these days.)

 

I tried using ChessBase earlier, but using the 'Delete Fritz commentary' option removes time control information as well, which I'd like to keep. I figure that if I really wanted an evaluation of a game, I can run one myself, and it will probably be more accurate.

 

I'll try SCID later today, but I'm thinking a text editor of some sort might be the solution.

skelos

If the [%...] never splits over two lines it's a little easier. Can you post a sample game? Any solution I provide would probably be in perl, but that's free to install on Windows and runs well there despite originating last century (oh, I feel old) on Unix. Before Linux even existed. happy.png

Or with a sample someone with any Microsoft language (C#, Visual Basic, whatever) might be able to whip something up.

skelos

A text editor and global search and replace (with sophisticated enough matching) would work, but most text editors aren't written for huge files. Still, if you only have to do it once and get it right, it's done. Um, do make backups, but you knew that, right? grin.png

prof_frink

 OK, so here's two samples of the database. This game just has the %eval numbers:

 

[Event "Rated Bullet game"]
[Site "https://lichess.org/hca0mb9v"]
[White "LEGENDARY_ERFAN"]
[Black "Mariss"]
[Result "0-1"]
[UTCDate "2013.01.01"]
[UTCTime "00:15:38"]
[WhiteElo "1182"]
[BlackElo "1457"]
[WhiteRatingDiff "-30"]
[BlackRatingDiff "+5"]
[ECO "C00"]
[Opening "French Defense #2"]
[TimeControl "60+0"]
[Termination "Normal"]

1. e4 { [%eval 0.2] } 1... e6 { [%eval 0.13] } 2. Bc4 { [%eval -0.31] } 2... d5 { [%eval -0.28] } 3. exd5 { [%eval -0.37] } 3... exd5 { [%eval -0.31] } 4. Bb3 { [%eval -0.33] } 4... Nf6 { [%eval -0.35] } 5. d4 { [%eval -0.34] } 5... Be7 { [%eval 0.0] } 6. Nf3 { [%eval 0.0] } 6... O-O { [%eval -0.08] } 7. Bg5 { [%eval -0.19] } 7... h6 { [%eval -0.29] } 8. Bxf6 { [%eval -0.36] } 8... Bxf6 { [%eval -0.37] } 9. O-O { [%eval -0.36] } 9... c6 { [%eval -0.12] } 10. Re1 { [%eval -0.17] } 10... Bf5 { [%eval -0.04] } 11. c4?! { [%eval -0.67] } 11... dxc4 { [%eval -0.5] } 12. Bxc4 { [%eval -0.77] } 12... Nd7?! { [%eval -0.1] } 13. Nc3 { [%eval 0.0] } 13... Nb6 { [%eval 0.0] } 14. b3?! { [%eval -0.76] } 14... Nxc4 { [%eval -0.49] } 15. bxc4 { [%eval -0.65] } 15... Qa5 { [%eval -0.55] } 16. Rc1 { [%eval -0.79] } 16... Rad8 { [%eval -0.78] } 17. d5?? { [%eval -5.41] } 17... Bxc3 { [%eval -5.42] } 18. Re5? { [%eval -7.61] } 18... Bxe5 { [%eval -7.78] } 19. Nxe5 { [%eval -7.72] } 19... cxd5 { [%eval -7.81] } 20. Qe1? { [%eval -9.29] } 20... Be6?? { [%eval 3.71] } 21. Rd1?? { [%eval -12.34] } 21... dxc4 { [%eval -12.71] } 22. Rxd8?! { [%eval #-1] } 22... Rxd8?! { [%eval -13.06] } 23. Qc3?! { [%eval #-2] } 23... Qxc3?! { [%eval #-4] } 24. g3 { [%eval #-3] } 24... Rd1+?! { [%eval #-4] } 25. Kg2 { [%eval #-4] } 25... Qe1?! { [%eval #-4] } 26. Kf3 { [%eval #-3] } 26... Qxe5 { [%eval #-2] } 27. Kg2 { [%eval #-2] } 27... Bd5+?! { [%eval #-2] } 28. Kh3 { [%eval #-1] } 28... Qh5# 0-1

 

And this one has %eval and %clk values:

 

[Event "Rated Standard game"]
[Site "https://lichess.org/tKtuqF34"]
[White "NotReallyNow"]
[Black "Chessares"]
[Result "1-0"]
[UTCDate "2018.02.28"]
[UTCTime "23:00:01"]
[WhiteElo "1670"]
[BlackElo "1702"]
[WhiteRatingDiff "+12"]
[BlackRatingDiff "-11"]
[ECO "C41"]
[Opening "Philidor Defense #2"]
[TimeControl "600+0"]
[Termination "Normal"]
[LichessId "tKtuqF34"]

1. e4 { [%eval 0.03] [%clk 0:10:00] } 1... e5 { [%eval 0.24] [%clk 0:10:00] } 2. Nf3 { [%eval 0.25] [%clk 0:09:58] } 2... d6 { [%eval 0.28] [%clk 0:09:57] } 3. Nc3 { [%eval 0.19] [%clk 0:09:56] } 3... Bg4 { [%eval 0.49] [%clk 0:09:48] } 4. h3 { [%eval 0.42] [%clk 0:09:50] } 4... Bh5 { [%eval 0.89] [%clk 0:09:46] } 5. Bb5+?! { [%eval 0.31] [%clk 0:09:37] } 5... c6 { [%eval 0.33] [%clk 0:09:02] } 6. Be2 { [%eval 0.23] [%clk 0:09:34] } 6... Be7 { [%eval 0.65] [%clk 0:08:49] } 7. O-O?! { [%eval 0.15] [%clk 0:09:12] } 7... Nf6 { [%eval 0.22] [%clk 0:08:45] } 8. d4 { [%eval 0.2] [%clk 0:09:04] } 8... exd4?! { [%eval 0.74] [%clk 0:08:33] } 9. Nxd4 { [%eval 0.4] [%clk 0:09:00] } 9... O-O?? { [%eval 4.44] [%clk 0:08:27] } 10. Bxh5 { [%eval 4.39] [%clk 0:08:58] } 10... c5? { [%eval 5.47] [%clk 0:08:16] } 11. Nb3?! { [%eval 4.76] [%clk 0:08:52] } 11... Nc6 { [%eval 4.87] [%clk 0:07:53] } 12. Bf3 { [%eval 4.66] [%clk 0:08:49] } 12... a6 { [%eval 4.86] [%clk 0:07:44] } 13. Re1 { [%eval 4.75] [%clk 0:06:42] } 13... Ne5 { [%eval 4.95] [%clk 0:07:38] } 14. Bf4 { [%eval 4.82] [%clk 0:06:23] } 14... Nxf3+?! { [%eval 5.52] [%clk 0:06:57] } 15. Qxf3 { [%eval 5.49] [%clk 0:06:21] } 15... b5 { [%eval 5.89] [%clk 0:06:42] } 16. Nd2? { [%eval 4.86] [%clk 0:04:01] } 16... c4?! { [%eval 5.43] [%clk 0:06:28] } 17. e5?! { [%eval 4.64] [%clk 0:03:57] } 17... dxe5 { [%eval 4.51] [%clk 0:06:25] } 18. Bxe5?? { [%eval 0.78] [%clk 0:03:54] } 18... Qxd2 { [%eval 0.75] [%clk 0:06:16] } 19. Re3?! { [%eval 0.01] [%clk 0:03:05] } 19... Qxc2 { [%eval 0.12] [%clk 0:05:22] } 20. Bxf6?! { [%eval -0.73] [%clk 0:02:58] } 20... Bxf6 { [%eval -0.57] [%clk 0:05:21] } 21. Re2 { [%eval -0.75] [%clk 0:02:50] } 21... Qg6 { [%eval -0.66] [%clk 0:05:06] } 22. Ne4?! { [%eval -1.26] [%clk 0:02:46] } 22... Bd4 { [%eval -1.0] [%clk 0:04:29] } 23. Rf1 { [%eval -1.47] [%clk 0:02:34] } 23... Rae8?! { [%eval -0.94] [%clk 0:04:18] } 24. Rfe1 { [%eval -1.4] [%clk 0:02:30] } 24... Ba7? { [%eval 0.7] [%clk 0:03:33] } 25. Nf6+ { [%eval 0.62] [%clk 0:02:12] } 25... Qxf6 { [%eval 0.7] [%clk 0:02:56] } 26. Rxe8 { [%eval 0.61] [%clk 0:01:23] } 26... Qxb2?? { [%eval #3] [%clk 0:02:32] } 27. Rxf8+ { [%eval #2] [%clk 0:01:20] } 27... Kxf8 { [%eval #2] [%clk 0:02:30] } 28. Qa8+ { [%eval #1] [%clk 0:01:19] } 28... Bb8 { [%eval #1] [%clk 0:02:28] } 29. Qxb8# { [%clk 0:01:17] } 1-0

 

I'd like to keep the clock info but not the evals.

prof_frink

I guess the game moves are all on one line.

macer75

Looks like you're going to have to employ some people.

Shock_Me
In the first example (%eval only) if you were to remove the square brackets and the text inside, leaving only the empty curly braces would the application just ignore the empty curly braces or fail? In the former case, a simple python script using a regular expression to find and remove all occurrences of [%eval*] would remove all of them but leave behind the empty curly braces.

If that doesn’t fly, it’s only slightly more difficult to also delete the curly braces only if they contain nothing but the %eval. My own awkward solution would be to delete all [%eval*] then go back through the file and delete all {}, but I’m sure there’s classier ways.
prof_frink

Sure, I'd be willing to give that a shot. Don't know much about python or coding, but I've got python installed now and I'm familiar with using the command-line in Windows, so I'm up for it.

 

So what kind of command would I use in python to go about doing that? (Yes, you can tell how ignorant I am about these things, lol.)

Shock_Me

Maybe starting from ground up with python maybe isn't the most efficient way.  Let's go the text editor route, with Microsoft word (other text editors will have the same capability, but different specifics)

1. Save a copy of the .pgn file as a .txt

2. Open the file in Word

3. From the home tab, open the "Replace" dialog

4. Enable wildcard use by checking the box

5. In the "find what" field, enter \[%eval*\]

(note that the "\" escapes the "[" and "]" which would otherwise have special meaning)

6. Leave the "replace with" field empty

7. Click "replace all"

8. If you need to remove the empty curly braces, repeat 5-7 using \{\} as the search string in step 5

9. Save edited file as .txt and then rename in windows as .pgn

 

As I'm not near my windows computer ATM, I might have it slightly wrong in the details, but this is the process. If you have a great many files to parse, you might get to the point where learning a bit of python might be easier. You'd need a basic understanding of how to iterate through a list of files, open them, edit and write to them programmatically.  

 

Shock_Me

Oops, I just reread the OP.

this will be very tedious to do 1 million times.  

'I'll throw something together in python and get back to you....

skelos

The two substitutions is the way I'd approach it too: first take out the [%eval] part and then remove empty comments.

For one file (of whatever size) that's easy. For multiple files you need file handling code as well, so either the code takes a list of pathnames or the directory layout needs to be known.

A solution is within reach, definitely. A little more information or perhaps @Shock_Me will come back with something that fits the file arrangement.

skelos

In the Python documentation, "re" is the key you're looking for, I believe:

https://docs.python.org/3/library/re.html

If you're new to Python, do start with a 3.x version. 2.x is getting a bit long in the tooth!

prof_frink

True, that sounds like the logical way of doing it.

 

Yeah, I'd only be doing one file at a time.

 

Oh, and I have 3.x installed. I've run a Python script I downloaded before (for something completely unrelated, sending ISRCs for CD tracks to Musicbrainz), so I'm somewhat familiar with the interface, but creating a script from scratch is currently beyond me. (Learning some basic Python is on my list of short-term life goals, lol.)

p89trd

You can do it with all the files in a folder as well - maybe try stackexchnage or a computer programing site? The guy who wrote the book "automate the boring stuff with python" has a site and the book is online - you could like teach yourself in a day or so how to do it (I used to do this stuff but am far too out of practice to relearn at the moment).

Shock_Me

Agree with all the above. The real "meat" of the python code would indeed be using regular expressions, specifically re.sub which will substitute the search target string in this case with null.

What is the structure of this pgn database? As a single .pgn file can contain multiple games, how many individual .pgn files are we dealing with? What is the directory structure, are there multiple directories and subdirectories? Iterating through an elaborate directory tree structure becomes more difficult, but still doable. 

KingArthur127
prof_frink wrote:

Forgive me if this is in the wrong forum, but I didn't see a 'computer chess' forum, so I'm posting it here.

 

 

I've got a database of games in PGN format and some of the games contain [%eval] computer evaluations. I'd like to remove them from the database without deleting any of the other comments.

 

What would you say is the best way of doing so? Is there a tool/utility that will allow me to do this, or can this be done with a text editor? (The database in question is a little over 1 million games.)

Oi matey yes their is tool for this

skelos

So the key question is what is the directory structure these files are in? There will be more code to walk that than tweak the files unless they're all in one directory or something, and for one million games (if one game per file) that seems unlikely.

File systems have improved at handling large numbers of files in directory, but 1,000,000 is still  lot.

skelos

https://docs.python.org/3/library/os.html

os.walk() will probably do most of that, however.

My feeling is it's still only about 20 lines of code (so probably closer to 50 wink.png) but without a few more details writing anything would be by-guess-and-by-golly.

After knowing the directory structure, an important question will be should backups of the files be left, or not. Pro: they're there and (other than the potential number of them) easy to put back as-was.

Con, you double the number of files and probably want to delete them sometime.