Fepstrum and Carrier Signal Decomposition of Speech Signals Through Homomorphic Filtering

2006 
Amplitude modulation (AM) and frequency modulation (FM) have been well defined and studied in the context of communications systems(S. Haykin, 1994). Borrowing upon these ideas, several researchers have applied AM-FM (V. Tyagi et al., 2003, M. Athineos et al., 2004, Q. Zhu and A. Alwan, 2000, B.E.D. Kingsbury et al., 1998) modeling for speech signals with mixed results. These techniques have varied in their definition and consequently the demodulation methods used therein. In this paper, we carefully define AM and FM signals in the context of ASR. We show that for a theoretically meaningful estimation of the AM signal, it is necessary to decompose the speech signal into several narrow spectral bands as opposed to the previous use of the speech modulation spectrum (V. Tyagi et al., 2003, M. Athineos et al., 2004, Q. Zhu and A. Alwan, 2000, B.E.D. Kingsbury et al., 1998), which was derived by decomposing the speech signal into increasingly wider spectral bands (such as critical, Bark or Mel). Due to the Hilbert relationships, the AM signal induces a component in the FM signal which is fully determinable from the AM signal (R. Kumaresan and A. Rao, 1999, V. Tyagi and C. Wellekens, 2005). We present a novel homomorphic filtering technique to extract the leftover FM signal after suppressing the redundant part of the FM signal. The estimated AM message signals are downsampled and their lower DCT coefficients are retained as speech features. These features carry information that is complementary to the MFCCs. A Tandem (H. Hermansky, 2003) combination of these two features is shown to improve recognition accuracy
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    3
    Citations
    NaN
    KQI
    []