The Effects of Generality of Linguistic Features on Corpus Closure

2021 
This study identified effects of the relationship between corpus closure and three variables; corpus size, generality of linguistic feature, and genre balance. Three linguistic features; nouns, past tense, and conditional clauses, having distinctive generality differences, were selected, and their frequencies were measured in COCA (Corpus of Contemporary American English) and its downsized and less balanced samples in order to extract correlation efficiencies between them. The statistical analyses revealed that bigger corpus size, higher generality of linguistic feature, and well balanced genre positively affect similarity growth between COCA and the samples. Since higher similarity means a higher degree of corpus closure, it is postulated that less specific linguistic features or well balanced texts are prone to be in a closure state earlier than opposite cases. The study results are not absolute, but an overall tendency, particularly with regards to the generality of linguistic features. Nevertheless, this study suggests that a primary requirement of a sound corpus is to mirror a variety of features of its population as much as possible in order to assure reliability of any studies based on it. Researchers have to estimate the size of the target text by considering its genre balance and the generality of the intended lexicon and grammar for feasibility of the study before planning data collection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []