PCCM-GAN: Photographic Text-to-Image Generation with Pyramid Contrastive Consistency Model
2021
Abstract Synthesizing photographic images from given text descriptions is a challenging problem. Although previous many studies have made significant progress on the visual quality of the generated images by using the multi-stage and attentional network, they ignore the interrelationships between the images generated by the generator in each stage and simply leverage the attention mechanism. In this paper, the Photographic Text-to-Image Generation with Pyramid Contrastive Consistency Model (PCCM-GAN) is proposed to generate photographic images. PCCM-GAN introduces two modules: a pyramid contrastive consistency model (PCCM) and a stack attention model (Stack-Attn). Based on generated images from the different stages, PCCM is proposed to compute a contrastive loss for training the generator. Stack-Attn concentrates on generating images with more details and better semantic consistency by stacking the global-local attention mechanism. And visual inspection of the inner product of PCCM and Stack-Attn is also performed to validate their effectiveness. Extensive experiments and ablation studies on the CUB and MS-COCO datasets prove the superiority of the proposed method.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
42
References
1
Citations
NaN
KQI