Audiomer: A Convolutional Transformer for Keyword Spotting

Surya Kant Sahu,Sai Mitheran,Juhi Kamdar,Meet Gandhi

Audiomer: A Convolutional Transformer for Keyword Spotting

2021

Surya Kant Sahu
Sai Mitheran
Juhi Kamdar
Meet Gandhi

Transformers have seen an unprecedented rise in Natural Language Processing and Computer Vision tasks. However, in audio tasks, they are either infeasible to train due to extremely large sequence length of audio waveforms or reach competitive performance after feature extraction through Fourier-based methods, incurring a loss-floor. In this work, we introduce an architecture, Audiomer, where we combine 1D Residual Networks with Performer Attention to achieve state-of-the-art performance in Keyword Spotting with raw audio waveforms, out-performing all previous methods while also being computationally cheaper, much more parameter and data-efficient. Audiomer allows for deployment in compute-constrained devices and training on smaller datasets.

Keywords:

Sequence
Residual
Software deployment
transformer
Speech recognition
Waveform
Computer science
Feature extraction
Keyword spotting
Raw audio format

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations