Word classification in bilingual printed documents

Sofiene Haboubi,Samia Maddouri,Hamid Amiri

Word classification in bilingual printed documents

2012

Sofiene Haboubi
Samia Maddouri
Hamid Amiri

In this paper we propose a method of identifying Arabic words from Arabic and Latin scripts in printed documents. This method is based on a statistical and geometric analysis to separate between words of a printed document. Structural features are used to describe the words extracted in previous step. Among the features used: the jambs, the diacritical points, the connected components, the hamps… From these characteristics, we construct our vector that allows the description. Functions of neural networks are used to classify the different words extracted. Classification is according to two classes Arabic or Latin. We present the found results of classification step, with a discussion on possible improvements.

Keywords:

Scripting language
Natural language processing
Artificial neural network
Geometric analysis
Connected component
Arabic
Text mining
Pattern recognition
Artificial intelligence
Computer science
statistical analysis
Speech recognition

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations