Ensembles of Classifiers for Parallel Categorization of Large Number of Text Documents Expressing Opinions

Frantisek Darena,Jan Zizka

Ensembles of Classifiers for Parallel Categorization of Large Number of Text Documents Expressing Opinions

2016

Opinions provided by people that used some services or purchased some goods are a rich source of knowledge. The opinion classification, applying mostly supervised classifiers, is one of the essential tasks. Computerâ€™s technological capabilities are still a major obstacle, especially when processing huge volumes of data. This study proposes and evaluates experimentally a parallelism application to the classification of a very large number of contrary opinions expressed as freely written text reviews. Instead of training a single classifier on the entire data set, an ensemble of classifiers is trained on disjunctive subsets of data and a group decision is used for the classification of unlabelled items. The main assessment criteria are computational efficiency and error rates, combined into a single measure to be able to compare ensembles of different sizes. Support vector machines, artificial neural networks, and deci- sion trees, belonging to frequently used classification methods, were examined. The paper demonstrates the suggested method viability when the number of text reviews leads to com- putational complexity, which is beyond the contemporary common PCâ€™s capabilities. Classification accuracy and the values of other classification performance measures (Precision, Recall, F-measure) did not decrease, which is a positive finding.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations