On Designing an Effective Training Set for Information Extraction

Young-Min Kim,Sa-kwang Song,Sungho Shin,Choong-Nyoung Seon,Seunggyun Hong,Hanmin Jung

On Designing an Effective Training Set for Information Extraction

2015

Young-Min Kim
Sa-kwang Song
Sungho Shin
Choong-Nyoung Seon
Seunggyun Hong
Hanmin Jung

While training set design has received less attention from academia compared to its significance, it becomes crucial in big data environments. We propose a novel way to construct a training set for information extraction. An effective data collection considering the trade-off between system quality and annotation difficulty is the core of the proposed approach. Instead of a random collection of data like usual systems, well-defined key expressions are used as sampling queries. This work is a part of an on-going R&D project and now in process of manual annotation that would be evaluated via final system quality.

Keywords:

Big data
Data collection
Expression (mathematics)
Training set
Annotation
Information extraction
Work in process
Data mining
Sampling (statistics)
Computer science
Artificial intelligence
Machine learning

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations