A survey of character characteristics: a quantitative and qualitative study of english letters in handwriting and fonts

2019 
Previous work of optical character recognition of typed forms or scene text will often get strong results by training over synthetic data, created by rendering words from fonts [3] [4] [7]. Similar methods have been used to create handwriting data from fonts, especially in Chinese and Japanese, but such methods in English are often done ad hoc over small hand selected font sets that look like handwriting, while OCR of typed forms will use much larger datasets [13] [5] [11] [9]. Here, we looked at comparing large scale font datasets with handwritten datasets, as a first step towards integrating them into large scale synthetic data. We did this by first collating large numbers of both fonts and "handwritten fonts", that is, fonts created to look like handwriting. We then use these fonts to create a large database consisting of all the numbers 0-9 as well as both uppercase and lowercase letters a-z. We then ran a large series of qualitative and quantitative comparisons between them and human handwritten data from the full NIST SD 19 dataset, which we use to categorize characters (such as seven written with and without a line through it). Our qualitative comparisons are visually oriented. For example, we calculated heatmaps for handwritten characters, fonts, and handwritten fonts, which allows for visual comparisons where some differences are immediately visible. Our quantitative methods encompass comparing statistics such as character orientations, complexity, eccentricity, density, among others. We find that although handwritten fonts are in general closer to actual handwriting than the full font set, it still has distinct characteristics of fonts, and occupies a space in between fonts and handwriting. The statistics and especially visualizations are also interesting in and of themselves, and present a continuation of work done in quantitative writing systems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    0
    Citations
    NaN
    KQI
    []