An Optical Character Recognition Technique for Devanagari Script Using Convolutional Neural Network and Unicode Encoding

Vamsi Krishna Kikkuri,Pavan Vemuri,Srikar Talagani,Yashwanth Thota,Jayashree Nair

An Optical Character Recognition Technique for Devanagari Script Using Convolutional Neural Network and Unicode Encoding

2021

This paper describes an optical character recognition technique to convert scanned Sanskrit text images scripted in Devanagari into digital documents. The segmentation mechanism, an adaptation from existing literature, identifies and separates upper and lower modifiers in a character. It also recognizes fused Devanagari letters. The segmented characters are fed to a convolutional neural network classifier which is trained upon a dataset with about 1.2 lakhs images belonging to 85 classes for the core part of a character. Each character from the segmentation phase is predicted and mapped to the respective Unicode representation. These Unicode values for characters are added to reconstruct the desired word. By keeping track of spaces between words and lines, a document can be reconstructed to an editable format.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations