How Much Do Synthetic Datasets Matter in Handwritten Text Recognition?

2021 
This paper explores synthetic image generators in dataset preparation to train models that allow human handwritten character recognition. We examined the most popular deep neural network architectures and presented a method based on autoencoder architecture and a schematic character generator. As a comparative model, we used a classifier trained on the whole NIST set of handwritten letters from the Latin alphabet. Our experiments showed that the 80% synthetic images in the training dataset achieved very high model accuracy, almost the same level as the 100% handwritten images in the training dataset. Our results prove that we can reduce the costs of creating, gathering, and describing human handwritten datasets five times over – with only a 5% loss in accuracy. Our method appears to be beneficial for a part of the training process and avoids unnecessary manual annotation work.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []