Multi-attention based semantic deep hashing for cross-modal retrieval

2021 
Cross-modal hashing is an efficient method to retrieve cross domain data. Most previous methods focused on measuring the discrepancy between intro-modality and inter-modality. However, recent researches show that semantic information is vital for cross-modal retrieval as well. As for human vision system, people establish multi-modality connections by utilizing attention mechanism with semantic information. Most of the previous methods, which are attention-based, only simply apply single modality attention, ignoring the effectiveness of multi-attention. Multi-attention is consisted of features from different semantic representation space. For better filling the gap of semantic connection among modalities, it could guides the output features to achieve alignment via utilizing attention mechanism. From this perspective, we propose a new cross-modal hashing method in this paper: 1) We design a multi-attention block to extract features effected by multi-attention. 2) We propose a correlative loss function to optimize the multi-attention matrix generated by the block, and also make hash code consistent and semantically correlated in subsequent generation. Experiments on three challenging benchmarks demonstrate the effectiveness of our method in the application of cross-modal retrieval.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    46
    References
    0
    Citations
    NaN
    KQI
    []