Post-correction of OCR Errors Using PyEnchant Spelling Suggestions Selected Through a Modified Needleman–Wunsch Algorithm

2018 
In this article, the efforts made by the Vocalizer project development team to correct errors from texts generated by OCR Tesseract are described. Vocalizer consists of a device that captures images from books, converts them into plain texts with the aid of an OCR (Optical Character Recognition) software. It also prepares the post-processing of the obtained text, and converts its textual content into voice. The whole process is performed autonomously. In the post-processing step, a modified Needleman-Wunsch algorithm was applied to select the suggestions made by the spellchecker PyEnchant. The results obtained were reasonable, which encourages further research.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    2
    Citations
    NaN
    KQI
    []