Comparative Assessment of Data Augmentation for Semi-Supervised Polyphonic Sound Event Detection
2020
In the context of audio ambient intelligence systems in Smart Buildings, polyphonic Sound Event Detection aims at detecting, localizing and classifying any sound event recorded in a room. Today, most of models are based on Deep Learning, requiring large databases to be trained. We propose a CRNN system exploiting unlabeled data with semi-supervised learning based on the “Mean teacher” method, in combination with data augmentation to overcome the limited size of the training dataset and to further improve the performances. This model was submitted to the challenge DCASE 2019 and was ranked second out of 58 systems submitted. In the present study, several conventional solutions of data augmentation are compared: time or frequency shifting, and background noise addition. It is shown that data augmentation with time shifting and noise addition, in combination with class-dependent median filtering, improves the performance by 9%, leading to an event-based F1-score of 43.2% with DCASE 2019 validation set. However, these tools rely on a coarse modelling (i.e. random variation of data) of intra-class variability observed in real life. Injecting acoustic knowledge into the design of augmentation methods seems to be a promising way forward, leading us to propose strategies of physics-inspired modelling for future work.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
23
References
2
Citations
NaN
KQI