HRTF-Based Data Augmentation Method for Acoustic Scene Classification

2021 
In acoustic scene classification (ASC), a technical problem yet to be solved is raised by the variety of recording devices. The amount of data recorded by different devices is usually unbalanced. The model trained with audio data collected by one device is hardly transferred to another device. Therefore, in order for the cross-device performance to be improved, this paper proposes a data augmentation method for ASC systems that take monaural audio samples as input, whereby the head-related transfer functions (HRTFs) are adopted to add artificial spatial information to monaural audio samples. The proposed method enables ASC systems to imitate the ability of human binaural hearing to distinguish spatial orientation and lock specific sound sources. The experiment results show that with the proposed method, the VGGNet and ResNet systems can get 13.4% and 14.4% higher accuracy than the DCASE 2020 baseline in the cross-device ASC, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    1
    Citations
    NaN
    KQI
    []