Multiclass Digital Audio Segmentation with MFCC Features using Naive Bayes and SVM Classifiers

2019 
In the field of digital audio processing, the classification of audio segments is a crucial pre-processing step towards performing more complex tasks such as automatic speech recognition or music genre classification. In our study, we investigate the use of bag of audio words, Naive Bayes and Support Vector Machines with Linear Kernel for the purpose of classifying audio segments into one of three main classes namely: silence, speech, and music. In addition, we compare the effect of using Mel Frequency Cepstral Coefficients (MFCC), their derivative and second derivative as features for both segmentation algorithms. Tests were carried out on a sample obtained from our call center database of call recordings. The results which are presented as accuracy score and Receiver Operation Characteristic (ROC) curve reveal the best use case of the chosen combination of features and segmentation algorithms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []