Parsing PGN files: disambiguating pieces

Parsing PGN files: disambiguating pieces

Avatar of Mensch-Maschine
| 0

I recently coded a PGN to FEN parsing program, and discovered some pitfalls disambiguating pieces. The destination square is clearly defined by algebraic notation, but the origin of the move is not. Frequently, there's only one piece available that can move to the destination. If there's more than one, additional information needs to be provided to disambiguate the origin square, like the "b" in Nbd7.

Initially, I thought capturing en passant could present problems, but it is clearly defined by algebraic notation:

Notice that the FEN en passant target square (you can have a look at this using the "share" button in the board) is set to e6 just before the black pawn advance.

Anyone who's notated a game or gone through a PGN file would know that when two or more pieces of the same kind are able to move to the same square, a disambiguation file (or sometimes rank) is needed:

Very rarely, both rank and file need to be disambiguated. The only example of this that I can think of is something like this:

What's probably less well known (at least I didn't know about it until recently) is that when one of the pieces is pinned to the king, disambiguation of the piece is not needed (and is not provided in computer output) as the move would be illegal. If I were notating an OTB game in which this occurred, I'd probably disambiguate it anyway - I can imagine not necessarily being aware of the pin when writing the move down. However, when analyzing output from computer generated PGN, the disambiguation rank/file will not be present, particularly since the PGN standard specifies that PGN files "generated by different PGN programs on the same computing system should be exactly equivalent, byte for byte". As such, there is no scope for computer output including disambiguation information about pinned pieces as a courtesy.

Why is this important? Well, in coding a PGN parser, to convert algebraic notation from a PGN to FEN or coordinate notation (which denotes the origin and destination squares of a move), pins need to be determined. Even in very minimal PGN parsers that know nothing about the legality of castling or even if the king is in check, disambiguation of origin squares for some moves will be needed.
So, a minimal approach to PGN parsing is to not worry about pins until two or more pieces need to be disambiguated. However, consider the following position taken from a PGN file from the internet that broke my initial attempts at parsing:

Without considering pins, both rooks could move to g8. After eliminating rooks that are pinned to the king, neither can move to g8! However, the rook on the back rank is only partially pinned, meaning that it can't move off the back rank, but it can move along it. In this situation, the partially pinned rook is the only rook available to move to g8. As such, it's necessary to sometimes differentiate between partially and wholly pinned pieces.

Bishop disambiguations are almost never needed as the only way two bishops can move to the same square is if a pawn has been underpromoted. Apart from a few artificial examples and studies, I can't think of a reason to underpromote to a bishop.  However, I did find one example from working through a PGN file from the internet of 1 million+ games:
I think this is one of these games played between friends where one player is rated considerably higher and throws in a few silly moves every now and then. At any rate, the game ended in a draw.