Proposal of statistical method of semantic indexing for multilingual documents

2016 
In this paper, we present a statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. These concepts represent the content of the document. Our contribution is based on two steps; we propose, in the first step, the extraction of index terms using the multilingual lexical resource EuroWordNet (EWN). In the second step, we pass from the representation of index terms to the representation of index concepts through conceptual network formalism. This latter is generated using the EWN resource and the association rules model (in attempt to discover the non taxonomic relations or contextual relations between the concepts of a document). These lasts are latent relations, buried in the text, and carried by the semantic context of the co-occurrence of concepts in the document. The proposed approach can be applied to several languages because it builds a linguistic and statistical process. This approach is validated by a set of experiments and comparison with other methods of indexing based on a corpus of TREC evaluation campaign 2001 and 2002 of the ad hoc task. We prove that the proposed indexing approach provides encouraging results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    2
    Citations
    NaN
    KQI
    []