Word classification in bilingual printed documents

2012 
In this paper we propose a method of identifying Arabic words from Arabic and Latin scripts in printed documents. This method is based on a statistical and geometric analysis to separate between words of a printed document. Structural features are used to describe the words extracted in previous step. Among the features used: the jambs, the diacritical points, the connected components, the hamps… From these characteristics, we construct our vector that allows the description. Functions of neural networks are used to classify the different words extracted. Classification is according to two classes Arabic or Latin. We present the found results of classification step, with a discussion on possible improvements.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    6
    Citations
    NaN
    KQI
    []