Instance pruning by filtering uninformative words: an information extraction case study

Alfio Massimiliano Gliozzo,Claudio Giuliano,Raffaella Rinaldi

Instance pruning by filtering uninformative words: an information extraction case study

2005

Alfio Massimiliano Gliozzo
Claudio Giuliano
Raffaella Rinaldi

In this paper we present a novel instance pruning technique for Information Extraction (IE). In particular, our technique filters out uninformative words from texts on the basis of the assumption that very frequent words in the language do not provide any specific information about the text in which they appear, therefore their expectation of being (part of) relevant entities is very low. The experiments on two benchmark datasets show that the computation time can be significantly reduced without any significant decrease in the prediction accuracy. We also report an improvement in accuracy for one task.

Keywords:

Computation
Natural language processing
Filter (signal processing)
Word-sense disambiguation
Computer science
Pruning
Pattern recognition
Machine learning
Information extraction
Specific-information
Very frequent
Artificial intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations