Description-based person search with multi-grained matching networks

2021 
Abstract Description-based person search aims to retrieve a person in the image database based on a description about that person. It is a challenging task since the visual image and the textual description belong to different modalities. To fully capture the relevance between person images and textual descriptions, we propose a multi-grained framework with three branches for visual-textual matching. Specifically, in the global-grained branch, we extract global contexts from the entire images and descriptions. In the fine-grained branch, we adopt visual human parsing and linguistic parsing to split images and descriptions into semantic components related to different body parts. We design two attention mechanisms including segmentation-based and linguistics-based attention to align visual and textual semantic components for fine-grained matching. To further exploit the spatial relations between fine-grained semantic components, we construct a body graph in the coarse-grained branch and exploit graph convolutional neural networks to aggregate fine-grained components into coarse-grained representations. The visual and textual representations learned by three branches are complementary to each other which enhance the visual-textual matching performance. Experimental results on the CUHK-PEDES dataset show that our approach performs favorably against state-of-the-art description-based person search methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    0
    Citations
    NaN
    KQI
    []