A comparison of two text representations for sentiment analysis

2010 
This paper compares two representations of text within the same experimental setting for sentiment orientation analysis, and in particular focuses on the sensitivity of the analysis to sentence length. The two representations compared in this paper are bag-of-words (BoW) and nine dimensional vector (9Dim). The former represents text with a high dimensional feature vector, which ignores grammatical structure and is lexicon-dependent. In contrast, the 9Dim representation encodes grammatical knowledge of clauses in sentences into a compact nine dimensional vector, which is lexicon-independent. Text is composed by multiple sentences since the grammatical structure of a single sentence or clause may not provide sufficient information for sentiment orientation classification. A convenient way to enrich grammatical knowledge in a text is to compose the text with multi-sentences, thereby lengthening the sample. We consider the length of text is an important factor in text classification. The aim of this paper is to demonstrate how text sentiment orientation classifiers' performance is improved when the length of the sentence comprising a training vector is varied. The experimental results indicated that the accuracy of the classifiers benefits from the increasing of the text's length, and the results also illustrated that the 9Dim method can provide comparable results to BoW under the same sentiment classification algorithm, support vector machines (SVM).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    9
    Citations
    NaN
    KQI
    []