Action recognition in still images using a multi-attention guided network with weakly supervised saliency detection

2021 
Action recognition in still images is an interesting subject in computer vision. One of the most important problems in still image-based action recognition is the lack of temporal information; At the same time, other existing problems such as cluttered backgrounds and diverse objects make the recognition task more challenging. However, there may be several salient regions in each action image, employing of which could lead to an improvement in the recognition performance. Moreover, since no unique and clear definition exists for detecting these salient regions in action recognition images, therefore, obtaining reliable ground truth salient regions is a highly challenging task. This paper presents a multi-attention guided network with weakly-supervised multiple salient regions detection for action recognition. A teacher-student structure is used to guide the attention of the student model into the salient regions. The teacher network with Salient Region Proposal (SRP) module generates weakly-supervised data for the student network in the training phase. The student network, with Multi-ATtention (MAT) module, proposes multiple salient regions and predicts the actions based on the found information in the evaluation phase. The proposed method obtains mean Average Precision (mAP) value of 94.2% and 93.80% on Stanford-40 Actions and PASCAL VOC2012 datasets, respectively. The experimental results, based on the ResNet-50 architecture, show the superiority of the proposed method compared to the existing ones on Stanford-40 and VOC2012 datasets. Also, we have made a major modification to the BU101 dataset which is now publicly available. The proposed method achieves mAP value of 90.16% on the new BU101 dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    51
    References
    0
    Citations
    NaN
    KQI
    []