Evaluation of Efficient Compression Properties of the Complete Oscillator Method, Part 2: Speech Coding

Anton Y. Yen,Irina Gorodnitsky

Evaluation of Efficient Compression Properties of the Complete Oscillator Method, Part 2: Speech Coding

2013

Summary form only given. This paper examines the performance of the recently proposed Complete Oscillator Method (COM) in the context of coding speech. The COM is shown to provide several advantages over traditional predictive coding techniques. Unlike the cascaded method employed by codecs such as Adaptive Multi-Rate (AMR), the COM encodes short and long-term data features jointly using a single, flexible representation. Joint approaches have previously been shown to yield efficiency gains [1]. Furthermore, the COM does not always require an explicit encoding of the residual error to reconstruct the signal. As AMR can allocate as much as 85% of its coding budget towards encoding the residual, there is substantial motivation for finding alternatives to source-filter coding methods. The first part of the paper compares the synthesis of speech frames using the COM versus a combination of linear predictor and adaptive codebook (LPAC) in order to assess the deterministic modeling capabilities of the COM relative to linear predictive codes. With both approaches optimized by minimizing the perceptually-weighted error (PWE) between the original and reconstructed speech, the COM is shown to achieve lower PWE on average than LPAC as implemented in the AMR standard for several types of speech. The COM improved PWE in 78.20% of voiced frames yielding a 2.02 dB PWE gain on average. For voiced to unvoiced transitions, the COM improved PWE in 76.75% of the frames with a 1.26 dB average gain. For unvoiced speech, the COM consistently improved PWE but the average gain was not significant. Only for unvoiced to voiced transitions did the COM not produce gains in average PWE. The second part of the paper compares the synthesis of speech frames using the COM at several bit rates to standard AMR and Speex codecs to show that the COM can produce comparable quality speech in a significant percentage of frames. Using weighted spectral slope distance (WSS) as a metric, a 5.5 kbps COM was seen to outperform 12.2 kbps AMR in 24.12% of speech frames. These results are not intended to demonstrate the workings of a COM-only speech coder, but rather to suggest how existing codecs can achieve lower bit rates by using the COM to encode some subset of frames. For example, by using the COM in the lowest bit rate mode sufficient to achieve a similar WSS as 12.2 kbps AMR, the average bit rate can potentially be reduced to 9.16 kbps.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations