logo
    New technique to use the GMM in speaker recognition system (SRS)
    8
    Citation
    8
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Gaussian mixture models (GMM) have been widely applied in speaker recognition system (SRS); it is the baseline speaker modeling approach. A GMM is composed of a joint probability distribution function (PDF) described by the weighted sum of several multivariate Gaussian PDFs, each multivariate Gaussian PDF is termed as a Mixture Component, The Mixture Component Number (N) is fixed at the classical method in the beginning of training phase in this case all speakers have a GMM model with the identical mixture number exp (16, 64, 128). To enhance effectiveness of speaker recognition system based on GMM we propose in this article a new technique used training GMM algorithm to calculate the best number mixture component for each speaker model. Results show that the new method can improve the performance compared with the basic GMM.
    Keywords:
    Component (thermodynamics)
    Bayesian methods are used to analyse the problem of training a model to make predictions about the probability distribution of data that has yet to be received. Mixture distributions emerge naturally from this framework, but are not ideally matched to the density estimation problems that arise in image processing. An extension, called a partitioned mixture distribution is presented, which is essentially a set of overlapping mixture distributions. An expectation maximisation training algorithm is derived for optimising partitioned mixture distributions according to the maximum likelihood description. Finally, the results of some numerical simulations are presented, which demonstrate that lateral inhibition arises naturally in partitioned mixture distributions, and that the nodes in a partitioned mixture distribution network co-operate in such a way that each mixture distribution in the partitioned mixture distribution receives its necessary complement of computing machinery.
    Complement
    Citations (21)
    Conventional speaker recognition systems use Gaussian mixture models (GMM) to model a speaker's voice based on the speaker's acoustic characteristics. This method is categorized as a non-discriminative training process, as the model-building process does not take into account the negative examples of the speaker. To increase the discriminative properties of a GMM for each speaker, a new approach that includes both positive and negative examples during the speaker training process is proposed. In this approach, speaker models are trained by moving the mixture model's means in such a way as to maximize the likelihood of speaker data while also minimizing the likelihood of negative examples for the speaker. The effectiveness of this approach on classification accuracies on speaker recognition tasks is tested on the NTIMIT database and NIST SRE 2003 corpora. The results indicate improvements in the performance of the system built using this new approach when compared to the traditional GMM-based speaker recognition systems.
    Discriminative model
    NIST
    Speaker diarisation
    Citations (8)
    Since mixing degree of the traditional Gaussian mixture model is constant, and it does not conform to the characteristics of speaker feature distribution. In this case, the problems of fitting deficiencies or fitting excessive exist and it will affect the speaker recognition rate. A new algorithm to improve the Gaussian mixture model was proposed and it was applied to speaker recognition. The algorithm adaptively adjusted the weight, mean and covariance of the Gaussian component according to distribution characteristics of the speaker's characteristic parameter. It made the improved Gaussian mixture model could better fit distribution features of the characteristic parameters of the speaker. Thus the speaker recognition rate was improved. Experiments showed that the speaker recognition rate of improved Gaussian mixture model was higher than the traditional Gaussian mixture model.
    Feature (linguistics)
    Mixture modeling, which considers the potential heterogeneity in data, is widely adopted for classification and clustering problems. Mixture models can be estimated using the Expectation-Maximization algorithm, which works with the complete estimating equations conditioned by the latent membership variables of the cluster assignment based on the hierarchical expression of mixture models. However, when the mixture components have light tails such as a normal distribution, the mixture model can be sensitive to outliers. This study proposes a method of weighted complete estimating equations (WCE) for the robust fitting of mixture models. Our WCE introduces weights to complete estimating equations such that the weights can automatically downweight the outliers. The weights are constructed similarly to the density power divergence for mixture models, but in our WCE, they depend only on the component distributions and not on the whole mixture. A novel expectation-estimating-equation (EEE) algorithm is also developed to solve the WCE. For illustrative purposes, a multivariate Gaussian mixture, a mixture of experts, and a multivariate skew normal mixture are considered, and how our EEE algorithm can be implemented for these specific models is described. The numerical performance of the proposed robust estimation method was examined using simulated and real datasets.
    Divergence (linguistics)
    Citations (0)
    Speaker clustering is involved in serial structure speaker identification system to reduce the algorithm delay and computational complexity. The speech is first classified into speaker group, and then searches the most likely one inside the group. Difference between Gaussian mixture models (GMMs) is widely applied in speaker classification. The paper proposes a novel measure based on pseudo-divergence, the ratio of inter-model dispersion to intra-model dispersion, to denote the difference between GMMs. And the measure is used to perform speaker clustering. Experiments indicate that the measurement works well to denote the difference of GMMs and has improved performance of speaker clustering
    Speaker identification
    Speaker diarisation
    Divergence (linguistics)
    Citations (2)
    Feature (linguistics)
    Feature vector
    Speaker diarisation
    Citations (0)
    Despite intuitive expectation and experimental evidence that phonemes contain useful speaker discriminating information, phoneme-based speaker recognition systems reported so far were not found to perform better than phoneme-independent speaker recognition systems based on Gaussian Mixture Model (GMM). The paper proposes a new phoneme-based speaker verification technique that uses models obtained by adaptation of well-trained speaker GMMs. The new proposed system was found to consistently outperform comparable sized phoneme-independent GMM based speaker verification systems in experiments held with clean and telephone speech databases.
    Speaker Verification
    Speaker diarisation
    Citations (15)
    This paper provides an overview of Gaussian Mixture Model (GMM) and its component of speech signal. During the earlier period it has been revealed that Gaussian Mixture Model is very much appropriate for voice modeling in speaker recognition system. For Speaker recognition, Gaussian mixture model is an essential appliance of statistical clustering. The task effortlessly performed by humans is not effortless for machine or computers such as voice recognition or face recognition so for this function speaker recognition technology makes available a solution, using this technology the computers/machines outperforms than humans.
    Component (thermodynamics)
    Citations (2)
    The design and implementation of a real-time speaker recognition system which is based on GMM(Gaussian Mixture Model) are presented. The system has the characteristics of real time speaker identification and real time speaker verification. In the lab environment, the performance of the system, as well as the model adaptation, has been fully tested with GMMs of different numbers of Gaussian mixtures and different sampling rates. The testing results show that the GMM-based system has a satisfactory correctness in performing speaker recognition.
    Speaker diarisation
    Citations (0)