Investigating the Use of Semi-Supervised Convolutional Neural Network Models for Speech/Music Classification and Segmentation

David Doukhan,Jean Carrive

Investigating the Use of Semi-Supervised Convolutional Neural Network Models for Speech/Music Classification and Segmentation

2017

David Doukhan
Jean Carrive

A convolutional neural network architecture, trained with a semi-supervised strategy, is proposed for speech/music classification (SMC) and segmentation (SMS). It is compared to baseline machine learning algorithms on three SMC corpora and demonstrates superior performances, associated to perfect media-level speech recall scores. Evaluation corpora include speech-over-music segments with durations varying between 3 and 30 seconds. Early SMS results are presented. Segmentation errors are associated to musical genres not covered in the training database, and/or with close to speech acoustic properties. These experiments are aimed to help the design of novel speech/music annotated resources and evaluation protocols, suited to TV and radio stream indexation.

Keywords:

Music information retrieval
Convolutional neural network
Artificial intelligence
Recall
Architecture
Segmentation
Pattern recognition
Indexation
Computer science
audio segmentation
Speech recognition

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations