Ensembles of Classifiers for Parallel Categorization of Large Number of Text Documents Expressing Opinions

2016 
Opinions provided by people that used some services or purchased some goods are a rich source of knowledge. The opinion classification, applying mostly supervised classifiers, is one of the essential tasks. Computer’s technological capabilities are still a major obstacle, especially when processing huge volumes of data. This study proposes and evaluates experimentally a parallelism application to the classification of a very large number of contrary opinions expressed as freely written text reviews. Instead of training a single classifier on the entire data set, an ensemble of classifiers is trained on disjunctive subsets of data and a group decision is used for the classification of unlabelled items. The main assessment criteria are computational efficiency and error rates, combined into a single measure to be able to compare ensembles of different sizes. Support vector machines, artificial neural networks, and deci- sion trees, belonging to frequently used classification methods, were examined. The paper demonstrates the suggested method viability when the number of text reviews leads to com- putational complexity, which is beyond the contemporary common PC’s capabilities. Classification accuracy and the values of other classification performance measures (Precision, Recall, F-measure) did not decrease, which is a positive finding.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []