Improved Estimation of Articulatory Features Based on Acoustic Features with Temporal Context

2015 
The paper deals with neural network-based estimation of articulatory features for Czech which are intended to be applied within automatic phonetic segmentation or automatic speech recognition. In our current approach we use the multi-layer perceptron networks to extract the articulatory features on the basis of non-linear mapping from standard acoustic features extracted from speech signal. The suitability of various acoustic features and the optimum length of temporal context at the input of used network were analysed. The temporal context is represented by a context window created from the stacked feature vectors. The optimum length of the temporal contextual information was analysed and identified for the context window in the range from 9 to 21 frames. We obtained 90.5% frame level accuracy on average across all the articulatory feature classes for mel-log filter-bank features. The highest classification rate of 95.3% was achieved for the voicing class.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    1
    Citations
    NaN
    KQI
    []