On Designing an Effective Training Set for Information Extraction

2015 
While training set design has received less attention from academia compared to its significance, it becomes crucial in big data environments. We propose a novel way to construct a training set for information extraction. An effective data collection considering the trade-off between system quality and annotation difficulty is the core of the proposed approach. Instead of a random collection of data like usual systems, well-defined key expressions are used as sampling queries. This work is a part of an on-going R&D project and now in process of manual annotation that would be evaluated via final system quality.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []