On the Use of Spectro-Temporal Features in Noise-Additive Speech

2011 
On the Use of Spectro-Temporal Features in Noise-Additive Speech by Suman Ravuri Master of Science in Electrical Engineering and Computer Science University of California, Berkeley Professor Nelson Harold Morgan, Chair Most extant features attempt to model how humans hear in some way. MFCCs, for instance, mimic the actions of the basilar membrane by triangle filter integration on the mel frequency scale, and emulate the hair cell stages with log compression. Most of this auditory inspiration has been limited to very early stages of human hearing, since beyond that point requires understanding parts of the brain, which up until this point has proven impossible. Recently, however, a number of experiments of the primary auditory cortex of ferrets (which share many of the same features as the human version) have shown that individual neurons are tuned to fire at one particular spectral and temporal modulation. Incorporating these features could help automatic speech recognition, since previously this sort of parameterization has not been considered in existing features. The use of these spectro-temporal features has been shown to improve speech recognition performance in a variety of tasks. In this work, I investigate the performance of spectrotemporal features under noisy conditions. In particular, I study the performance of spectrotemporal features in mismatched conditions, in which the training set consists of only clean data and the test set comprises of speech added to a variety of noises. Although spectro-temporal features can be shown to improve recognition performance in this task, there exist a number of hurdles one must overcome in order to incorporate these features into a speech recognition system. One major problem is high dimensionality consisting of possibly tens of thousands of dimensions of spectro-temporal features, which
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    2
    Citations
    NaN
    KQI
    []