On the use of filter bank features for isolated word recognition

1983 
The vast majority of commercially available isolated word recognizers use a filter bank analysis as the front end processing for recognition. To the designers of such recognizers there exists a myriad of design choices. It is not well understood how these designs choices (e.g. the number of filters, filter spacing, and post-processing of the filter bank outputs) affect recognizer performance. In this paper we present results of a performance evaluation of filter bank recognizers in a speaker-trained, isolated word recognition test using dialed-up telephone line recordings. First we studied the effects of various filter bank parameters on system performance. We designed a total of 13 filter banks including 8 uniform and 5 non-uniform filter banks. Within these 13 filter banks we also considered both slightly and highly overlapping filters. The results indicate that the best performance (highest word accuracy) on the 39 word alpha digits vocabulary, with 4 talkers, is obtained by both a 15 channel uniform filter bank and a 13 channel highly overlapping non-uniform critical band filter bank. Next we studied the effects of selected preprocessing and post-processing techniques on system performance. For this we used the non-uniform 13 channel filter bank. The results indicate that almost none of the processing techniques improved system performance; however, some techniques can potentially reduce hardware cost (computation and storage) without adversely affecting system performance. We also compared the results of the best filter banks recognizers with a conventional LPC based word recognizer on the same vocabulary. The performance of the best filter bank was approximately 4% worse than that of an 8th order LPC-based recognizer. We also studied the effect of additive wideband Gaussian noise on system performance of both the filter bank and the LPC recognizers. Gaussian white noise was added to the speech recordings at signal-to-noises ratios of from 0 to 30 dB. Recognition tests were then performed which indicated that the LPC system performance degraded faster than that of the filter banks; however, the point at which both systems have identical performance is at a signal-to-noise ratio of 6 dB.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    9
    Citations
    NaN
    KQI
    []