Ensemble of handcrafted and deep features for urban sound classification

2021 
Abstract The urban sound classification has a strong relation with feature extraction. In this paper, we present a compact and effective representation capable of characterizing different urban sounds based on deep and handcrafted features combination. To this end, we propose a small parameter space CNN model to extract deep features that are combined with handcrafted features extracted from audio signals. Then, we apply a feature selection step to reduce feature dimensionality and to investigate handcrafted features that enrich deep features to better discriminate between urban sounds. The feature selection experiment results indicate that associating perceptual, static, and physical features with deep features improves the classification performance and allows a dimension reduction up to 62.32% for the combined descriptors. The proposed descriptors achieve a classification accuracy of 86.2% for the ESC (urban noises) dataset and 96.16% for the UrbanSound8K dataset, outperforming most of the state-of-the-art CNN models for urban sound classification.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    2
    Citations
    NaN
    KQI
    []