Image–text sentiment analysis via deep multimodal attentive fusion
2019
Abstract Sentiment analysis of social media data is crucial to understand people’s position, attitude, and opinion toward a certain event, which has many applications such as election prediction and product evaluation. Though great effort has been devoted to the single modality (image or text), less effort is paid to the joint analysis of multimodal data in social media. Most of the existing methods for multimodal sentiment analysis simply combine different data modalities, which results in dissatisfying performance on sentiment classification. In this paper, we propose a novel image–text sentiment analysis model, i.e., Deep Multimodal Attentive Fusion (DMAF), to exploit the discriminative features and the internal correlation between visual and semantic contents with a mixed fusion framework for sentiment analysis. Specifically, to automatically focus on discriminative regions and important words which are most related to the sentiment, two separate unimodal attention models are proposed to learn effective emotion classifiers for visual and textual modality respectively. Then, an intermediate fusion-based multimodal attention model is proposed to exploit the internal correlation between visual and textual features for joint sentiment classification. Finally, a late fusion scheme is applied to combine the three attention models for sentiment prediction. Extensive experiments are conducted to demonstrate the effectiveness of our approach on both weakly labeled and manually labeled datasets.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
56
References
55
Citations
NaN
KQI