BVTNet: Multi-label Multi-class Fusion of Visible and Thermal Camera for Free Space and Pedestrian Segmentation

2020 
Deep learning-based visible camera semantic segmentation report state-of-the-art segmentation accuracy. However, this approach is limited by the visible camera’s susceptibility to varying illumination and environmental conditions. One approach to address this limitation is visible and thermal camera-based sensor fusion. Existing literature utilizes this sensor fusion approach for object segmentation, but the approach’s application to free space segmentation has not been reported. Here, a multi-label multi-class visible-thermal camera learning framework, termed as the BVTNet, is proposed for the semantic segmentation of pedestrians and the free space. The BVTNet estimates the pedestrians and free space in an individual multi-class output branch. Additionally, the network also separately estimates the free space and pedestrian boundaries in another multi-class output branch. The boundary semantic segmentation is integrated within the full semantic segmentation framework in a post-processing step. The proposed framework is validated on the public MFNet dataset. A comparative analysis with baseline algorithms and ablation studies with BVTNet variants show that the proposed framework report state-of-the-art segmentation accuracy in real-time in challenging environmental conditions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    2
    Citations
    NaN
    KQI
    []