Learnable MFCCs for Speaker Verification.

Xuechen Liu,Sahidullah,Tomi Kinnunen

Learnable MFCCs for Speaker Verification.

2021

Xuechen Liu
Sahidullah
Tomi Kinnunen

We propose a learnable mel-frequency cepstral coefficients (MFCCs) front-end architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven version of four linear transforms in a standard MFCC extractor-windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7% (VoxCeleb1) and 9.7% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort. Index Terms-Speaker verification, feature extraction, melfrequency cesptral coefficients (MFCCs).

Keywords:

Artificial neural network
Discrete Fourier transform
Discrete cosine transform
Word error rate
Speech recognition
Computer science
Interpretability
Mel-frequency cepstrum
Feature extraction
Filter bank

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations