Text Categorization as a Graph Classification Problem

François Rousseau,Emmanouil Kiagias,Michalis Vazirgiannis

Text Categorization as a Graph Classification Problem

2015

François Rousseau
Emmanouil Kiagias
Michalis Vazirgiannis

In this paper, we consider the task of text categorization as a graph classification problem. By representing textual documents as graph-of-words instead of historical n-gram bag-of-words, we extract more discriminative features that correspond to long-distance n-grams through frequent subgraph mining. Moreover, by capitalizing on the concept of k-core, we reduce the graph representation to its densest part – its main core – speeding up the feature extraction step for little to no cost in prediction performances. Experiments on four standard text classification datasets show statistically significant higher accuracy and macro-averaged F1-score compared to baseline approaches.

Keywords:

Computer science
Pattern recognition
Categorization
Discriminative model
Boosting methods for object categorization
Graph (abstract data type)
Text graph
Feature extraction
Machine learning
Artificial intelligence
Graph
Natural language processing
graph classification
text categorization

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations