Speech/music discrimination based on wavelets for broadcast programs

Emmanuel Didiot,Irina Illina,Odile Mella,Dominique Fohr,Jean Paul Haton

Speech/music discrimination based on wavelets for broadcast programs

2018

Emmanuel Didiot
Irina Illina
Odile Mella
Dominique Fohr
Jean Paul Haton

The problem of speech/music discrimination is a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) performance. This paper proposes new features for the Speech/Music discrimination task. We propose to use a decomposition of the audio signal based on wavelets, which allows a good analysis of non stationary signal like speech or music. We compute different energy types in each frequency band obtained from wavelet decomposition. Two class/non-class classifiers are used : one for speech/non-speech, one for music/non-music. On the broadcast test corpus, the proposed wavelet approach gives better results than the MFCC one. For instance, we have a significant relative improvements of the error rate of 39% for the speech/music discrimination task.

Keywords:

Speech processing
Frequency band
Mel-frequency cepstrum
Speech recognition
Stationary process
Voice activity detection
Wavelet
Audio signal
Word error rate
Computer science
Pattern recognition
Artificial intelligence
Broadcasting

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations