Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

Carlos-Emiliano González-Gallardo,Juan-Manuel Torres-Moreno,Azucena Montes Rendón,Gerardo Sierra

Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

2017

Carlos-Emiliano González-Gallardo
Juan-Manuel Torres-Moreno
Azucena Montes Rendón
Gerardo Sierra

In this paper we describe a dynamic normalization process applied to social network multilingual documents (Facebook and Twitter) to improve the performance of the Author profiling task for short texts. After the normalization process, $n$-grams of characters and n-grams of POS tags are obtained to extract all the possible stylistic information encoded in the documents (emoticons, character flooding, capital letters, references to other users, hyperlinks, hashtags, etc.). Experiments with SVM showed up to 90% of performance.

Keywords:

Support vector machine
Computer science
Normalization (statistics)
Profiling (computer programming)
Information retrieval
Hyperlink
Social network

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations