PCCM-GAN: Photographic Text-to-Image Generation with Pyramid Contrastive Consistency Model

Zhongjian Qi,Jun Sun,Jinzhao Qian,Jiajia Xu,Shu Zhan

PCCM-GAN: Photographic Text-to-Image Generation with Pyramid Contrastive Consistency Model

2021

Abstract Synthesizing photographic images from given text descriptions is a challenging problem. Although previous many studies have made significant progress on the visual quality of the generated images by using the multi-stage and attentional network, they ignore the interrelationships between the images generated by the generator in each stage and simply leverage the attention mechanism. In this paper, the Photographic Text-to-Image Generation with Pyramid Contrastive Consistency Model (PCCM-GAN) is proposed to generate photographic images. PCCM-GAN introduces two modules: a pyramid contrastive consistency model (PCCM) and a stack attention model (Stack-Attn). Based on generated images from the different stages, PCCM is proposed to compute a contrastive loss for training the generator. Stack-Attn concentrates on generating images with more details and better semantic consistency by stacking the global-local attention mechanism. And visual inspection of the inner product of PCCM and Stack-Attn is also performed to validate their effectiveness. Extensive experiments and ablation studies on the CUB and MS-COCO datasets prove the superiority of the proposed method.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations