A Caption Is Worth A Thousand Images: Investigating Image Captions for Multimodal Named Entity Recognition.

Shuguang Chen,Gustavo Aguilar,Leonardo Neves,Thamar Solorio

A Caption Is Worth A Thousand Images: Investigating Image Captions for Multimodal Named Entity Recognition.

2020

Shuguang Chen
Gustavo Aguilar
Leonardo Neves
Thamar Solorio

Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context. Due to advances in natural language processing (NLP) and computer vision (CV), many neural techniques have been proposed to incorporate images into the NER task. In this work, we conduct a detailed analysis of current state-of-the-art fusion techniques for MNER and describe scenarios where adding information from the image does not always result in boosts in performance. We also study the use of captions as a way to enrich the context for MNER. We provide extensive empirical analysis and an ablation study on three datasets from popular social platforms to expose the situations where the approach is beneficial.

Keywords:

language understanding
Artificial intelligence
Named-entity recognition
Natural language processing
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations