How Incompletely Segmented Information Affects Multi-Object Tracking and Segmentation (MOTS)

2020 
In recent years, deep learning has made dramatic advances in computer vision field, especially in improving the performance of object detection as well as instance semantic segmentation. Still, multi-object tracking (MOT) remains a very challenging issue. Even in state-of-the-art deep learning-based object detectors, a preferred paradigm for MOT: tracking-by-detection, can only slightly improve the tracking performance. Pixel-level information is considered more precise and useful for tracking performance improvement than using conventional information, such as foreground or background content in a bounding box. However, the performance of current state-of-the-art models for automatically annotating pixel-level information is still far from the expectation of human beings. Therefore, we shall explore how multi-object tracking and segmentation (MOTS) is affected when the information obtained after applying instance semantic segmentation is incomplete. We propose a mask-guided two-streamed augmentation learning (MGTSAL) algorithm, which can be applied to TrackR-CNN to alleviate significant drop of MOTS performance when encountering incompletely segmented information. We evaluate the proposed approach on MOTS KITTI dataset, and our approach outperforms the baseline model TrackR-CNN in all our experimental settings. The promising experimental results and ablation study validate the effectiveness of the proposed approach.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []