On the assessment of text corpora
2009
Classifier-independent measures are important to assess the quality of corpora. In this paper we present supervised and unsupervised measures in order to analyse several data collections for studying the following features: domain broadness, shortness, class imbalance, and stylometry. We found that the investigated assessment measures may allow to evaluate the quality of gold standards. Moreover, they could also be useful for classification systems in order to take strategical decisions when tackling some specific text collections.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
22
References
8
Citations
NaN
KQI