Audio event recognition based on DBN features from multiple filter-bank representations

2015 
In the audio event classification or detection research field, the representation of the audio itself is important. Many researchers tried to apply Deep Belief Network (DBN) to learn new representations of the audio. The mel filter-bank feature, which is obtained based on mel scale, is commonly used as the low level representation of the audio in the pre-processing procedure of DBN. However, the mel bands used in mel filter-bank feature may not be sufficient for the comprehensive representation of the diverse audio events in the real world and then it will make it difficult for DBN to learn good audio features. In this paper, two steps are taken to explore and tackle the problem. In the first step, we conduct a comparison of the effects among different arrangements of frequency bands to DBN feature learning in the audio event recognition. Here the arrangements of frequency bands include mel bands, bark bands, linear bands and pyramid bands. In the second step, in order to utilize the different classification capabilities of the DBN features on different audio events, we adopt the Adaboost algorithm to fuse them. We conduct the experiments on real datasets collected from findsound website, and the results verifies that our proposed audio event classification system, which uses diverse features selected by Adaboost from all sets of DBN features, outperforms the one using only one kind of DBN feature set.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []