Multi-Scale Spatial Context Features Using 3-D Recurrent Neural Networks for Pedestrian Detection

2018 
Successful detection of pedestrians in an autonomous driving scene is challenging because of the high variation in pedestrian scales and the cluttered background which requires context information. In this paper, we propose to use 3-D recurrent neural networks (RNN) to extract rich spatial context information of different resolutions to improve the accuracy of pedestrian detection. Recurrent neural networks have been shown to improve the performance for many tasks like speech recognition by exploiting the dependency between inputs at different time steps. However, the chain structure in conventional RNN is not very suitable for 2-D image tasks. The proposed 3-D RNN has a multi-path structure that enables the context aggregation between not only the neighboring cells but also feature map of different resolutions and abstraction levels. The context extracted in this way is rich because both the spatial dependency on the same feature map and the information across different feature maps are taken into account. By combining the 3-D RNN with skip-pooling, this method can detect multi-scale pedestrians under complex driving scenarios. Performance evaluation shows that the proposed method has a competitive performance, with an average precision of 73.04% on KITTI dataset, and a miss rate of 9.31% on Caltech dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    0
    Citations
    NaN
    KQI
    []