Multimodal Representation with Embedded Visual Guiding Objects for Named Entity Recognition in Social Media Posts

2020 
Visual contexts often help to recognize named entities more precisely in short texts such as tweets or snapchat. For example, one can identify "Charlie'' as a name of a dog according to the user posts. Previous works on multimodal named entity recognition ignore the corresponding relations of visual objects and entities. Visual objects are considered as fine-grained image representations. For a sentence with multiple entity types, objects of the relevant image can be utilized to capture different entity information. In this paper, we propose a neural network which combines object-level image information and character-level text information to predict entities. Vision and language are bridged by leveraging object labels as embeddings, and a dense co-attention mechanism is introduced for fine-grained interactions. Experimental results in Twitter dataset demonstrate that our method outperforms the state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    3
    Citations
    NaN
    KQI
    []