Long Tail Visual Relationship Recognition with Hubless Regularized Relmix

Sherif Abdelkarim,Aniket Agarwal,Panos Achlioptas,Jun Chen,Jiaji Huang,Boyang Li,Kenneth Church,Mohamed Elhoseiny

Long Tail Visual Relationship Recognition with Hubless Regularized Relmix

2020

Several approaches have been proposed in recent literature to alleviate the long-tail problem, mostly in the object classification task. We propose to study the task of Long-Tail Visual Relationship Recognition (LTVRR), which aims at generalizing on the structured long-tail distribution of visual relationships (e.g., "rabbit grazing on grass"). In this setup, subject, relation, and object classes individually follow a long-tail distribution. We first introduce two large-scale long-tail visual relationship recognition benchmarks to study this task, dubbed as VG8K-LT (5330 objects, 2000 relationships) and GQA-LT (1703 objects, 310 relations). VG8K-LT and GQA-LT are built upon the widely used Visual Genome and GQA datasets. In contrast to existing benchmarks, some classes appear at a very low frequency ($1-14$ examples). We use these benchmarks to study the performance of several state-of-the-art long-tail models on LTVRR setup. We developed a visiolinguistic hubless (ViLHub) loss that consistently encourages visual classifiers to be more predictive of tail classes while being accurate on the head. We also propose relationship Mixup augmentation, dubbed as RelMix, to improve performance on the tail on VG8K-LT and GQA-LT benchmarks with the best performance achieved when combined with ViLHub loss. Benchmarks and code will be made available.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations