Fast Environmental Sound Classification based on Convolutional Neural Network Pruning Algorithm

2021 
As an important component of non-speech audio classification technology, environmental sound classification (ESC) has attracted the attention of researchers in recent years. Benefiting from the rapid development of deep learning technology, researchers input manually extracted speech features into a convolutional neural network (CNN) to extract deeper abstract features to complete the final classification task. Make the accuracy of ESC reach a higher level. However, the improvement of accuracy often resorts to the constantly deepening network structure. This caused a lot of parameter redundancy. The huge amount of floating-point operations (FLOPs) also slows down the running speed of CNN and increases the burden on storage and computing resources. To this end, we propose a convolutional neural network pruning technology to compress the CNN model, reduce the amount of CNN parameters and FLOPs, thereby eliminating the redundancy of CNN. Specifically, we use ResNet-20 as the backbone network. First, pre-train the network to make it have good classification performance. Then randomly remove a small number of convolution channels and fine-tune them to restore accuracy. Iterate this process until the model reaches the target compression ratio. We conduct experiments on the UrbanSound8K dataset. Thanks to the strong plasticity of the CNN model, the pruned model does not have a significant decrease in accuracy, even when the compression rate is low, because the network redundancy is reduced, the accuracy will be slightly improved. Our model achieves an accuracy that competes with state-of-the-art method while being lighter in weight.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    0
    Citations
    NaN
    KQI
    []