This paper proposes a speech codec based on the multi-pulse based CELP (MP-CELP) coding and convolutional coding algorithms for the ETSI adaptive multi-rate (AMR) standard. The codec operates at several speech coding rates, maintaining a fixed gross rate including speech and channel coding for the full-rate (FR) and half-rate (HR) channel modes. MP-CELP has great features of easily changing the speech coding rate by controlling the parameters such as the number of pulses and other parameters. Subjective tests show that the proposed AMR codec in the FR channel mode achieves higher performance than that of the enhanced FR codec, and the proposed codec in the HR channel mode gives a comparable coding quality to that by the full-rate codec, by selecting an optimal coding rate for each channel condition. T-tests based on the test results also show that the proposed speech codec meets about 80% of the seventeen requirements, which are selected from the AMR standard study report. Therefore, the proposed codec is promising for the AMR standard.
A wideband noise suppressor for the AMR (adaptive multi-rate) wideband speech codec is proposed. The wideband noise suppressor features weighted noise estimation for an accurate noise estimate, pseudo noise injection for more suitable spectral gain, and synthesis windowing for smooth transition at frame boundaries. In the subjective evaluation with the AMR wideband speech codec, the proposed noise suppressor satisfies all eighteen provisional requirements, which was originally standardized for AMR narrowband noise suppressor, in absolute category rating, ten out of twelve provisional requirements in comparison category rating (CCR), respectively. Although it does not meet two requirements in CCR, its basic performance suggests that the proposed wideband noise suppressor is most likely to meet all requirements by the evaluation with 24 listeners specified in the test plan for AMR narrowband noise suppressor.
This paper presents analysis results of earlystage proposals in standardization so as to consider companies' effort toward obtaining standard-essential patents (SEPs). The number of the SEPs has greatly increased in the Information and Communication Technology (ICT) field. Some companies in the ICT field strategically aim to obtain SEPs. It is important for such companies to file patents timely and to propose their technologies covered by the patents to standardization bodies from the early stage of the standardization process. oneM2M Release 1, a standard for Machine to Machine (M2M), is investigated in this study. First, proponents of each item in the functional-requirement specification discussed in the early stage are analyzed in order to know technical areas in which major companies are interested. Then, the relationship between the interesting areas (the early-stage proposals) of the major companies and their patents is investigated. The investigation results show that the early-stage proposals by vendors (manufacturers) are highly overlapped with their patents. It can be said that the vendors are strategically proposing from early stages in order to get SEPs. On the other hand, less overlap is observed in nonpracticing entities (NPEs) and mobile operators. Since NPEs filed patents in a wide range of the technical fields, it is likely to have intention to obtain SEPs by employing other companies' proposals. The mobile operators have the relatively large number of proposals as compared with the number of filed patents. It can be said that they consider that it is important to make the M2M market larger rather than to make benefit from SEPs. From the observation above, one can conclude that the companies in different business categories have different strategies in the standardization from the viewpoint of obtaining SEPs.
This paper proposes a flexible CELP speech coder with bitrate and bandwidth scalabilities for multimedia applications. The coder is based on multi-pulse-based CELP coding and consists of a bitrate scalable base-band coder and a bandwidth extension tool. The bitrate scalable base-band CELP coder employs multi-stage excitation coding based on an embedded-coding approach. The multi-pulse excitation codebook at each stage is adaptively produced depending on the selected excitation signal at the previous stage. The bandwidth scalability is realized by bandwidth conversion from base-band CELP parameters to those for wideband without a widely used subband structure. The bandwidth conversion improves base-band coding quality and expands bandwidth, simultaneously. The comparison test results show that the bitrate scalable coder is equivalent in speech quality to the fixed-bitrate CELP coder at the same bitrate for the narrowband speech. In the mean opinion score (MOS) tests, the proposed 16 kbit/s coder with the bandwidth scalability achieves equivalent coding quality to ITU-T G.722 at 56 kbit/s. The proposed coder is currently evaluated as the MPEG-4 CELP speech standard.
This paper presents a recursive estimation of ARMA parameters based on a robust time-varying model for speech analysis. This algorithm is basically similar to the recursive least-squares estimation (RLS), but it is different in that the time variation of the ARMA parameters is dependent on past ones. This is based on the assumption that the speech production process does not vary instantly. This method has two linear estimators: an input estimator and a parameter estimator for known input. The variation of the parameters is estimated by using the likelihood function. The proposed method is equivalent, under certain conditions, to the RLS with the forgetting factor. However, using the proposed method, this factor can be estimated as the value that represents the variation of the parameters. Finally, the proposed method was applied to a synthetic speech and real speech. The results show that the estimated spectra sufficiently represent the dynamic movement of formants without jitters or extreme estimation errors.
This paper evaluates MPEG-4 narrowband (NB) CELP speech coding under various mobile communication conditions, such as clean, background noise and transmission errors. In order to make the codec robust against the errors with minimum increase of redundant bits, a CRC error correction code is attached into the codec as well as an error concealment is included in the decoder. Subjective evaluation results demonstrate that the speech quality for MPEG-4 speech coding at above 8.3 kb/s is higher than that for the ITU-T G.726 ADPCM at 32 kb/s in the clean speech condition. Further, the speech quality degradation is less than 0.1 in MOS under 10/sup -3/ bit error conditions, and still comparable to or higher than that for G.726 at 32 kb/s without error.
This paper proposes an MP-CELP (Multi-Pulse-based CELP) speech coding at 6.4 kb/s with 10 ms frame. In MP-CELP, amplitudes or signs of multi-pulse excitation are simultaneously vector quantized (VQ). A combination search between multiple pulse location candidates and VQ codebook remarkably improves the quantization performance. In order to improve speech quality for background noise conditions, an adaptive pulse location restriction method is developed. The subjective evaluation results show that speech quality for 6.4 kb/s MP-CELP is higher than that for G.726 at 32 kb/s and is equivalent to that for 6.3 kb/s G.723.1 with 30 ms frame in clean speech and tandem conditions. For background noise conditions, the adaptive pulse location restriction significantly improves MOS value by 0.9. The speech quality is equivalent to that for G.723.1, but still does not reach to that of 24 kb/s G.726, except interference talker condition.