Online Multiple Person Tracking Using Fully-Convolutional Neural Networks and Motion Invariance Constraints

2018 
We propose a novel framework for multiple person tracking in crowded scenes with the tracking-by-detection paradigm. In such scenes, noisy detections and frequent occlusions are major challenges. A common way to handle the challenges is to use Convolutional Neural Networks (CNNs) based appearance features to discriminate objects. However, to get sufficiently discriminative features, CNNs demand a large amount of training data and sometimes compromise efficiency. We address the challenges in two ways. Firstly, an Appearance Net modified from a Siamese network is proposed to identify persons in crowed scenes. Compared to other CNNs with deep layers and careful fine-tuning, our Appearance Net is efficient and accurate enough without any fine-tuning. Secondly, a motion invariance model is designed to tackle noisy detections caused by cluttered background or inaccurate bounding box localization, and missing objects caused by occlusions. By utilizing spatial geometric constraints, our tracker can generate reliable trajectories under challenging scenes. Extensive experiments on the two largest multi-object tracking (MOT) benchmarks, namely MOT15 and MOT17, demonstrate competing performance of the proposed tracker over a number of state-of-the-art trackers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    0
    Citations
    NaN
    KQI
    []