Capturing Temporal Dependencies Through Future Prediction for CNN-Based Audio Classifiers

2021 
This paper focuses on the problem of temporal dependency modeling in the CNN-based models for audio classification tasks. To capture audio temporal dependencies using CNNs, we take a different approach from the purely architecture-induced method and explicitly encode temporal dependencies into the CNN-based audio classifiers. More specifically, in addition to the classification objective, we require the CNN model to solve an auxiliary task of predicting the future features, which is formulated by leveraging the Contrastive Predictive Coding (CPC) loss. Furthermore, a novel hierarchical CPC (HCPC) model is proposed for capturing multi-level temporal dependencies at the same time. The proposed model is evaluated on a wide range of non-speech audio signals, including musical and in-the-wild environmental audio signals. We show that the proposed approach improves the backbone CNNs consistently on all tested benchmark datasets and outperforms a DenseNet model trained from scratch.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []