Extracting old persian cuneiform font out of noisy images (handwritten or inscription)

2017 
The process of converting the text in the digital image to font (encrypted text) is called OCR. This paper is involved with extracting inscribed texts on the Achaemenid inscriptions. This is the first proper quality example of using OCR to recognizing Achaemenid scripts. There are different approaches to recognizing characters, of which we have chosen open source Tesseract engine for segmentation, learning and classification in this research. Due to existence of noise (stone crack) in inscriptions, this paper uses some image processing techniques to eliminate noises. This system's final output includes: extraction of cuneiform font, Persian and English transcription of sentences, sentence pronunciation and translation of a substantial number of extracted Persian and English words, which makes us better understand the way they spoke in that era. Acquired results of validation and result section indicates that this system has been able to properly cope with the recognition of cuneiform characters and has classified all characters of test data properly with about 92% accuracy. The acquired results are promising that they are able to make and improve NLP in this area.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    5
    Citations
    NaN
    KQI
    []