An Optical Character Recognition Technique for Devanagari Script Using Convolutional Neural Network and Unicode Encoding
2021
This paper describes an optical character recognition technique to convert scanned Sanskrit text images scripted in Devanagari into digital documents. The segmentation mechanism, an adaptation from existing literature, identifies and separates upper and lower modifiers in a character. It also recognizes fused Devanagari letters. The segmented characters are fed to a convolutional neural network classifier which is trained upon a dataset with about 1.2 lakhs images belonging to 85 classes for the core part of a character. Each character from the segmentation phase is predicted and mapped to the respective Unicode representation. These Unicode values for characters are added to reconstruct the desired word. By keeping track of spaces between words and lines, a document can be reconstructed to an editable format.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
15
References
0
Citations
NaN
KQI