Semantic-aligned reinforced attention model for zero-shot learning

2022 
Zero-shot learning (ZSL) aims to recognize unseen images from invisible classes, by transferring semantic knowledge from visible classes to invisible classes. Such as, although humans have never seen a zebra, if we know that “a horse with stripes is a zebra”, then we can easily recognize it when we see a zebra. Given semantic descriptions, the human can capture intrinsic visual clues from different channels or appearance factors (e.g., color, texture) on salient parts. But computers are not smart enough to recognize it with high accuracy, they still need to make progress in the learning of semantic-aligned visual representations. Therefore, we propose a semantic-aligned reinforced attention (SRA) model to improve the attributes localization ability. We aim to discover invariable features related to class-level semantic attributes from variable intra-class vision information, and thereby avoid misalignment between much visual information and simple semantic representations. Specially, during the localization of spatial attention, we develop an efficient constraint directly on feature map to ensure the intra-attention compactness and inter-attention dispersion characteristics like human gaze. While for the channel, we proposed a novel attributes attention cross entropy loss to exploit the supervision effect of each semantic attribute subset. Experiments on three ZSL benchmarks, i.e., CUB, SUN and AWA2, indicate the competitiveness of our proposed method against the state-of-the-art ZSL methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []