Post-correction of OCR Errors Using PyEnchant Spelling Suggestions Selected Through a Modified Needleman–Wunsch Algorithm
2018
In this article, the efforts made by the Vocalizer project development team to correct errors from texts generated by OCR Tesseract are described. Vocalizer consists of a device that captures images from books, converts them into plain texts with the aid of an OCR (Optical Character Recognition) software. It also prepares the post-processing of the obtained text, and converts its textual content into voice. The whole process is performed autonomously. In the post-processing step, a modified Needleman-Wunsch algorithm was applied to select the suggestions made by the spellchecker PyEnchant. The results obtained were reasonable, which encourages further research.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
10
References
2
Citations
NaN
KQI