Weakly Supervised Attention Inference Generative Adversarial Network for Text-to-Image

2019 
Text-to-Image is a significant problem in computer vision. Recently, there are some problems in the quality and semantic consistency of the generated image. In this paper we propose an approach for Text-to-Image synthesis by focusing on the perception. We use text embeddings to generate semantic feature maps before target images synthesis instead of generating target images directly. The ground truth semantic layouts are calculated by interpretable classification network, and we will learn to generate semantic layouts before inferring target images from them. We have trained our approach on the CUB2011 dataset and verified the quality of its generation and the interpretability of the network in simple background and small scale feature generation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    38
    References
    0
    Citations
    NaN
    KQI
    []