Visual Affordance Detection Using an Efficient Attention Convolutional Neural Network

2021 
Abstract Visual affordance detection is an important issue in the field of robotics and computer vision. This paper proposes a novel and practical convolutional neural network architecture that adopts an encoder-decoder architecture for pixel-wise affordance detection. The encoder network comprises two modules: a dilated residual network that is the backbone for feature extraction, and an attention mechanism that is used for modeling long-range, multi-level dependency relations. The decoder network consists of a novel up-sampling layer that maps the low-resolution encoder feature to a high-resolution pixel-wise prediction map. Specifically, integrating an attention mechanism into our network reduces the loss of salient details and improves the feature representation performance of the model. The results of experiments conducted on the University of Maryland dataset (UMD) verify that the proposed network with the attention mechanism and up-sampling layer improved performance compared with classical methods. The proposed method lays the foundation for subsequent research on multi-task learning by physical robots.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    0
    Citations
    NaN
    KQI
    []