Multilingual XML-Based Named Entity Recognition for E-Retail Domains.

2002 
We describe the multilingual Named Entity Recognition and Classification (NERC) subpart of an e-retail product comparison system which is currently under development as part of the EU-funded project CROSSMARC. The system must be rapidly extensible, both to new languages and new domains. To achieve this aim we use XML as our common exchange format and the monolingual NERC components use a combination of rule-based and machine-learning techniques. It has been challenging to process web pages which contain heavily structured data where text is intermingled with HTML and other code. Our preliminary evaluation results demonstrate the viability of our approach.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    17
    Citations
    NaN
    KQI
    []