General Recurrent Attention Model for Jointly Multiple Object Recognition and Weakly Supervised Localization

2018 
Classical convolutional neural networks used in computer vision tasks perform excellently in accuracy, but they are unsatisfactory in computational cost especially with the networks going deeper and the image size going larger. Special models based on visual attention have showed their advantages in dealing with spatial information for saving computational cost at inference time. These models are designed to imitate human visual attention mechanism, but they are not able to achieve realize adaptive receptive scope for different object size. In this paper, a recurrent location and scope selection approach is proposed to improve the attention efficiency, which is more similar to human visual mechanism. We evaluate our model on the basic visual recognition task, where it outperforms the baselines and could provide approximated bounding boxes in a weakly supervised way.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []