Symbolic and Neural Learning of Named-Entity Recognition and Classification Systems in Two Languages
2002
This work compares two alternative approaches to the problem of acquiring named-entity recognition and classification systems from training corpora, in two different languages. The process of named-entity recognition and classification is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. The manual construction of rules for the recognition of named entities is a tedious and time-consuming task. For this reason, effective methods to acquire such systems automatically from data are very desirable. In this paper we compare two popular learning methods on this task: a decision-tree induction method and a multi-layered feed-forward neural network. Particular emphasis is paid on the selection of the appropriate data representation for each method and the extraction of training examples from unstructured textual data. We compare the performance of the two methods on large corpora of English and Greek texts and present the results. In addition to the good performance of both methods, one very interesting result is the fact that a simple representation of the data, which ignores the order of the words within a named entity, leads to improved results over a more complex approach that preserves word order.
Keywords:
- Word order
- Time delay neural network
- Feedforward neural network
- External Data Representation
- Artificial neural network
- Named entity
- Machine learning
- Named-entity recognition
- Information extraction
- Pattern recognition
- Artificial intelligence
- Computer science
- neural learning
- Natural language processing
- induction method
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
31
References
3
Citations
NaN
KQI