An Investigation into Performance Factors of Two-Stream I3D Networks

2021 
Two-Stream Inflated 3D ConvNet (I3D) is based on 2D convolutional networks. It is inflated into 3D to deal with spatiotemporal feature extraction and classification in videos. I3D network is an efficient solution for video action recognition, and outstanding results have been obtained after applying the model pre-trained with Kinetics dataset. This paper discusses some ideas for improving the efficiency of I3D network. Instead of counting on network architecture improvement, efforts are focused on two aspects: 1) from the point of view of data pre-processing, including training data cleansing and augmentation. A range of data augmentation schemes are investigated to enhance the balance and regularity of input data in the training and testing phases. This idea is inspired by the original I3D model and proposers. 2) from the perspective of network backbones, for example, through the application of ResNet-50 as an alternative backbone model to gain a better perception into key performance factors for Two-Stream I3D networks. Experiment results clearly show that the proposed hybrid improvement strategy brings substantial improvement in recognition accuracy for benchmark and practical datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []