Gated multimodal networks
2020
This paper considers the problem of leveraging multiple sources of information or data modalities (e.g., images and text) in neural networks. We define a novel model called gated multimodal unit (GMU), designed as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities.
The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates.
The GMU can be used as a building block for different kinds of neural networks and can be seen as a form of intermediate fusion. The model was evaluated on two multimodal learning tasks in conjunction with fully connected and convolutional neural networks. We compare the GMU with other early- and late-fusion methods, outperforming classification scores in two benchmark datasets: MM-IMDb and DeepScene.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
80
References
18
Citations
NaN
KQI