Semantically Similarity-Wise Dual-Branch Network for Scene Graph Generation

2022 
Scene graph generation aims to detect visual entities and relationships between them from an image. The object-level visual information is of vital importance for predicting accurate relationships. However, most existing methods essentially encode visual information with coarse supervised information, since they regard different relationships as mutually exclusive semantics with equal-distance labels by taking cross-entropy function as the main training loss. Intuitively, different relationship semantics naturally have their own similarity and dissimilarity with different level distances, i.e. , the topological information of relationship semantics. It can serve as an inspiring hint to aid learning to grasp the key related visual information. Accordingly, we propose a Semantically Similarity-wise Dual-branch Network (SSDN) which introduces topological information of relationship semantics as extra supervision to aid learning extracting and encoding relationship-related visual information. To avoid possible chaotic feature learning and enable the introduced knowledge to be better absorbed during inference, we design a dual-branch framework consisting of an auxiliary branch and an inference branch. The topological information extracted from the groundtruth is introduced at the front end of the auxiliary branch which then generates a soft embedding to be propagated to the inference branch in a knowledge distillation manner. Extensive experiments show that our model averagely outperforms state-of-the-art approaches on benchmark Visual Genome and VRD significantly, which demonstrates its effectiveness and superiority.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []