building up a dataset for automatic score sheet recognition

Sort:
BetweenTheBoard

I am trying to build up a training dataset to train a convolutional neural network to convert handwritten score sheets into PGN. Does anybody have a (large) body of score sheet images + respective PGNs or knows which persons/entities could potentially provide those? Since score sheet formats vary widely between countries and federations I'd be interested in a variety of them.

foucheta

Couldn't you use some already trained network ?

In the following article, he mentionned 4 datasets for handwritten text recognition. You could learn a model than learns to read handwritten text, then apply it to you score sheets

 

https://medium.com/@arthurflor23/handwritten-text-recognition-using-tensorflow-2-0-f4352b7afe16

foucheta

Here, you have a pre-trained HTR model

 

https://github.com/githubharald/SimpleHTR

BetweenTheBoard

Sure, there are plenty of pre-trained networks for plain text recognition. However, the tasks for chess scoresheets is more difficult with one of the difficulties being the lack of standardization when it comes to the layout of the score sheet. You need some method to isolate (and order) the text-snippets representing the half-moves in a first step.