Arabic, English and French: Three Languages in a Filtering Systems Evaluation Project

2009 
The InFile project (INformation, FILtering, Evaluation) is a cross-language adaptive filtering evaluation campaign, sponsored by the French National Research Agency. The project is organized by the CEA-LIST, ELDA and the Laboratory GERIICO of the University Lille3. It has an international scope as it was a pilot track of the CLEF 2008 and a main track of the CLEF 2009 campaigns. The corpus is a collection of about 1,4 millions newswires in three languages, Arabic, English and French provided by Agence France Press (AFP) and selected from a 3 years period. The profiles’ corpus (the corpus requests) is made of 51 profiles from which 30 concern general news and events (national and international affairs, politics, sports...) and 21 concern scientific and technical information. This paper is presenting the InFile evaluation paradigm in general and focuses on a study of the Arabic part of the corpus in particular. The coverage mismatch between profiles and Arabic documents, conceptual and terminology gaps in the transfer between English/French and Arabic are also discussed in this article.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    1
    Citations
    NaN
    KQI
    []