
Parsing PGN files: disambiguating pieces
I recently coded a PGN to FEN parsing program, and discovered some pitfalls disambiguating pieces. The destination square is clearly defined by algebraic notation, but the origin of the move is not. Frequently, there's only one piece available that can move to the destination. If there's more than one, additional information needs to be provided to disambiguate the origin square, like the "b" in Nbd7.
Initially, I thought capturing en passant could present problems, but it is clearly defined by algebraic notation:
Notice that the FEN en passant target square (you can have a look at this using the "share" button in the board) is set to e6 just before the black pawn advance.
Anyone who's notated a game or gone through a PGN file would know that when two or more pieces of the same kind are able to move to the same square, a disambiguation file (or sometimes rank) is needed:
What's probably less well known (at least I didn't know about it until recently) is that when one of the pieces is pinned to the king, disambiguation of the piece is not needed (and is not provided in computer output) as the move would be illegal. If I were notating an OTB game in which this occurred, I'd probably disambiguate it anyway - I can imagine not necessarily being aware of the pin when writing the move down. However, when analyzing output from computer generated PGN, the disambiguation rank/file will not be present, particularly since the PGN standard specifies that PGN files "generated by different PGN programs on the same computing system should be exactly equivalent, byte for byte". As such, there is no scope for computer output including disambiguation information about pinned pieces as a courtesy.
Why is this important? Well, in coding a PGN parser, to convert algebraic notation from a PGN to FEN or coordinate notation (which denotes the origin and destination squares of a move), pins need to be determined. Even in very minimal PGN parsers that know nothing about the legality of castling or even if the king is in check, disambiguation of origin squares for some moves will be needed.
So, a minimal approach to PGN parsing is to not worry about pins until two or more pieces need to be disambiguated. However, consider the following position taken from a PGN file from the internet that broke my initial attempts at parsing:
Without considering pins, both rooks could move to g8. After eliminating rooks that are pinned to the king, neither can move to g8! However, the rook on the back rank is only partially pinned, meaning that it can't move off the back rank, but it can move along it. In this situation, the partially pinned rook is the only rook available to move to g8. As such, it's necessary to sometimes differentiate between partially and wholly pinned pieces.