Learning Disentangled Representation in Latent Stochastic Models: A Case Study with Image Captioning

Nidhi Vyas,Sai Krishna Rallabandi,Lalitesh Morishetti,Eduard H. Hovy,Alan W. Black

Learning Disentangled Representation in Latent Stochastic Models: A Case Study with Image Captioning

2019

Multimodal tasks require learning joint representation across modalities. In this paper, we present an approach to employ latent stochastic models for a multimodal task image captioning. Encoder Decoder models with stochastic latent variables are often faced with optimization issues such as latent collapse preventing them from realizing their full potential of rich representation learning and disentanglement. We present an approach to train such models by incorporating joint continuous and discrete representation in the prior distribution. We evaluate the performance of proposed approach on a multitude of metrics against vanilla latent stochastic models. We also perform a qualitative assessment and observe that the proposed approach indeed has the potential to learn composite information and explain novel combinations not seen in the training data.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations