Utilizing Domain Knowledge in End-to-End Audio Processing.

Tycho Max Sylvester Tax,Jose Luis Diez Antich,Hendrik Purwins,Lars Maaløe

Utilizing Domain Knowledge in End-to-End Audio Processing.

2017

Tycho Max Sylvester Tax
Jose Luis Diez Antich
Hendrik Purwins
Lars Maaløe

End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.

Keywords:

Convergence (routing)
Initialization
Machine learning
Artificial intelligence
Computer science
Convolutional neural network
Artificial neural network
Domain knowledge
End-to-end principle
Audio signal processing
Speech recognition
Classifier (linguistics)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations