Multi-Parameter Frequency Warping for Vtln by Gradient Search

2006 
The current method for estimating frequency warping (FW) functions for vocal tract length normalization (VTLN) is by maximizing the ASR likelihood score by an exhaustive search over a grid of FW parameters. Exhaustive search is inefficient when estimating multi-parameter FWs, which have been shown to give improvements in recognition accuracy over single parameter FWs (J.W. McDonough, 2000). Here we develop a gradient search algorithm to obtain the optimal FW parameters for MFCC features, since previous work focussed on PLP cepstral features (J.W. McDonough, 2000). The novel calculation involved was that of the gradient of the Mel filterbank with respect to the FW parameters. Even for a single parameter, the gradient search method was more efficient than grid search by a factor of around 1.6 on the average for male children speakers tested on models trained from adult males. When used to estimate multi-parameter sine-log allpass transform (SLAPT, (J.W. McDonough, 2000)) FWs for VTLN, more than 50% reduction in word error rate was obtained with five parameter SLAPT compared to single-parameter piecewise linear FW
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    10
    Citations
    NaN
    KQI
    []