Bat-inspired dynamic features and factors that modulate their impact on speech recognition

Alexander Hsu,Jin-Ping Han,Xiaodong Cui,Kartik Audhkhasi,Anupam K. Gupta,Joseph Sutlive,Tabassum Ahmed,Rolf Müller

Bat-inspired dynamic features and factors that modulate their impact on speech recognition

2018

One of the most serious remaining challenges in speech recognition is dealing with corruption of speech signal by other nuisance speech ("babble"). A promising approach to solving the problem of separating the signal of interest from the detractors is to inject direction dependent signatures into all signals received, which has been realized by bat-inspired biomimetic pinna—dynamic periphery. Changing the shape of a biomimetic pinna during the recordings introduces substantial time-variant signatures into speech signals. To investigate the utility of these signatures, we have used bioinspired signal representations (cochleagram and spikegram) as input for speech classifiers based on Gaussian mixture models (GMM) and hidden Markov models (HMM). The speech samples used were obtained from open source databases: spoken digits and alphabets from Carnegie Mellon University were mixed with babble or noise samples from Columbia University. Since the time-variant signatures were found to depend strongly on the direction of the sound source, we attempted to include datasets from different directions for training and testing to feed into the classifiers. The results indicate that dynamic periphery can substantially improve recognition and that these effects depend on the signal representation as well as the angular composition of the training dataset.One of the most serious remaining challenges in speech recognition is dealing with corruption of speech signal by other nuisance speech ("babble"). A promising approach to solving the problem of separating the signal of interest from the detractors is to inject direction dependent signatures into all signals received, which has been realized by bat-inspired biomimetic pinna—dynamic periphery. Changing the shape of a biomimetic pinna during the recordings introduces substantial time-variant signatures into speech signals. To investigate the utility of these signatures, we have used bioinspired signal representations (cochleagram and spikegram) as input for speech classifiers based on Gaussian mixture models (GMM) and hidden Markov models (HMM). The speech samples used were obtained from open source databases: spoken digits and alphabets from Carnegie Mellon University were mixed with babble or noise samples from Columbia University. Since the time-variant signatures were found to depend strongly on the dir...

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations