Adaptively Clustering-Driven Learning for Visual Relationship Detection

2020 
Visual relationship detection aims to describe the interactions between pairs of objects, such as person-ride-bike and bike-next to-car triplets. In reality, it is often the case that there exist some groups of strongly correlated relationships, while others are weakly related. Intuitively, the common relationships can be roughly categorized into three types: geometric (e.g., next to), possessive (e.g., has), or action (e.g., ride). However, previous studies ignore the relatedness discovery among multiple relationships, while only lie on an unified space to leverage visual features or statistical dependencies into categories. To tackle this problem, we propose an adaptively clustering-driven network for visual relationship detection. First, we propose two novel modules to jointly discover the common distribution space and preserve the characteristics of individual relationships for clustering. Then, a fused inference is designed to integrate the group-induced information and the language prior to facilitate the predicate inference. Especially, we design the Frobenius-norm regularization to boost the discriminative clustering. To the best of our knowledge, the proposed method is the first supervised framework to realize subject-predicate-object relationship-aware clustering for visual relationship detection. Extensive experiments show that the proposed method can achieve competing performances against the state-of-the-art methods on the Visual Genome dataset. Additional ablation studies further validate its effectiveness.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []