Fast Monocular Depth Estimation on an FPGA

2020 
Depth sensing is crucial for understanding 3D scenes on embedded systems such as home robots, self-driving cars, and drones. Monocular depth estimation which gives pixel-wise depth from a general camera, has attracted attention in recent years, due to the reliability, low-cost and small area requirement. Past research by using Convolutional Neural Network (CNN) has gained high accuracy and been increasing interest. However, the CNN requires a massive amount of MACs (Multiply ACcumulations) and weights, so its latency is extremely long. To address this problem, we present hardware-oriented pruning for separable convolutions and effectively parallelized MAC Unit. We introduce a filter-wise pruned DepthFCN and novel FPGA architecture that exploit its sparsity. Moreover, dense convolution and pruned separable convolution are implemented on a shared convolutional circuit due to high hardware efficiency and a high parallel degree. We compare the proposed FPGA-based system with the Jetson TX2. The FPGA accelerator achieves 123.6 FPS with 0.3 W power consumption for a 256×256 image, and its accuracy is 76.2%. Compared with the mobile GPU, it is 1.5 times faster and its power consumption is 20 times lower. We demonstrate the fastest monocular depth estimation by using a low-cost FPGA board that is suitable for embedded systems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []