Deep People Detection: A Comparative Study of SSD and LSTM-decoder

Atiqur Rahman,Prince Kapoor,Robert Laganière,Daniel Laroche,Changyun Zhu,Xiaoyin Xu,Ali Osman Ors

Deep People Detection: A Comparative Study of SSD and LSTM-decoder

2018

In this paper, we present a comparative study of two state-of-the-art object detection architectures - an end-to-end CNN-based framework called SSD [1] and an LSTM-based framework [2] which we refer to as LSTM-decoder. To this end, we study the two architectures in the context of people head detection on few benchmark datasets having small to moderately large number of head instances appearing in varying scales and occlusion levels. In order to better capture the pros and cons of the two architectures, we applied them with several deep feature extractors (e.g., Inception-V2, Inception-ResNet-V2 and MobileNet-V1) and report accuracy, speed and generalization ability of the approaches. Our experimental results show that while the LSTM-decoder can be more accurate in realizing smaller head instances especially in the presence of occlusions, the sheer detection speed and superior ability to generalize over multiple scales make SSD an ideal choice for real-time people detection.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations