MOTR: End-to-End Multiple-Object Tracking with TRansformer
2021
The key challenge in multiple-object tracking (MOT) task is temporal modeling
of the object under track. Existing tracking-by-detection methods adopt simple
heuristics, such as spatial or appearance similarity. Such methods, in spite of
their commonality, are overly simple and insufficient to model complex
variations, such as tracking through occlusion. Inherently, existing methods
lack the ability to learn temporal variations from data. In this paper, we
present MOTR, the first fully end-to-end multiple-object tracking framework. It
learns to model the long-range temporal variation of the objects. It performs
temporal association implicitly and avoids previous explicit heuristics. Built
on Transformer and DETR, MOTR introduces the concept of "track query". Each
track query models the entire track of an object. It is transferred and updated
frame-by-frame to perform object detection and tracking, in a seamless manner.
Temporal aggregation network combined with multi-frame training is proposed to
model the long-range temporal relation. Experimental results show that MOTR
achieves state-of-the-art performance. Code is available at
https://github.com/megvii-model/MOTR.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
58
References
4
Citations
NaN
KQI