language-icon Old Web
English
Sign In

On the assessment of text corpora

2009 
Classifier-independent measures are important to assess the quality of corpora. In this paper we present supervised and unsupervised measures in order to analyse several data collections for studying the following features: domain broadness, shortness, class imbalance, and stylometry. We found that the investigated assessment measures may allow to evaluate the quality of gold standards. Moreover, they could also be useful for classification systems in order to take strategical decisions when tackling some specific text collections.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    8
    Citations
    NaN
    KQI
    []