Attribute Centered Multimodal Response Generation in a Dialogue System

2021 
Dialogue system has become a prominent platform for human-machine interactions. The ongoing research in vision and language has opened new frontiers for building multimodal dialogue systems that incorporate information from the various complementary sources such as text, images, audios, and videos. For interactive systems, multimodal knowledge in the form of different modalities need to be presented to the user for effective communication. Recent research has mainly focused upon textual generation in a multimodal setting, thereby providing incomplete information to the users. Hence, in our current work, we present an attribute centered image generation framework in a multimodal system that is capable of generating images, conditioned upon the textual features with the help of the taxonomy-attribute combined tree. The visual features interact with multi-head attended textual features through attention based factorized bilinear pooling approach for fine-grained representation. Further, the multimodal representation is extended to generate images using the adversarial network. The loss functions encourage the network to generate images that are very similar to the natural image. We perform our experiments on the Multimodal Dialog Dataset (MMD) to create contextualized images using the fashion attribute features. Empirical studies show that the generated attribute centered images help in making the dialogue more engaging to the users.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []