Multi-modal generative adversarial network for zero-shot learning

2020 
Abstract In this paper, we propose a novel approach for Zero-Shot Learning (ZSL), where the test instances are from the novel categories that no visual data are available during training. The existing approaches typically address ZSL by embedding the visual features into a category-shared semantic space. However, these embedding-based approaches easily suffer from the “heterogeneity gap” issue since a single type of class semantic prototype cannot characterize the categories well. To alleviate this issue, we assume that different class semantics reflect different views of the corresponding class, and thus fuse various types of class semantic prototypes resided in different semantic spaces with a feature fusion network to generate pseudo visual features. Through the adversarial mechanism of the real visual features and the fused pseudo visual features, the complementary semantics in various spaces are effectively captured. Experimental results on three benchmark datasets demonstrate that the proposed approach achieves impressive performances on both traditional ZSL and generalized ZSL tasks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    7
    Citations
    NaN
    KQI
    []