Teacher-student Training for Acoustic Event Detection Using Audioset

2019 
This paper studies Acoustic Event Detection (AED) systems and the problem of their rapid and easy customisation to arbitrary deployment scenarios. Due to inherent challenges related to annotation processes of AED data (time-consuming and error-prone due to often unclear time-stamping), most of the available large-scale datasets for AED are released with weak clip-level labels, which also affects how one should design weakly-supervised training procedures. In this paper, we investigate a teacher-student training approach of learning low-complexity student models, using large teachers. We first show that state-of-the-art performance can be achieved by a Convolutional Neural Network (CNN) model with appropriate attention mechanism. Then we describe a framework that enables learning arbitrary small-footprint, generic or domain-expert, AED systems from generic teachers. We carry experiments on Audioset - a large-scale weakly labelled dataset of acoustic events.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    4
    Citations
    NaN
    KQI
    []