An Adaptive Segmentation Method Based on Gaussian Mixture Model (GMM) Clustering for DNA Microarray
6
Citation
5
Reference
10
Related Paper
Citation Trend
Abstract:
Microarray allows us to efficiently analyse valuable gene expression data. In this paper we propose a effective methodology for analysis of microarrays. Earlier a new gridding algorithm is proposed to address all individual spots and to determine their borders. Then, a classical Gaussian Mixture Model (GMM) is used to analyse array spots more flexibly and adaptively. The Expectation Maximization (EM) algorithm is used to estimate GMM parameters by Maximum Likelihood (ML) approach. In this paper, we also addressing the problem of artifacts by detecting and compensate using GMM mixture components and artifacts data present in foreground and background spots are corrected by performing mathematical morphology and histogram analysis methods.Keywords:
Spots
Mixture theory
As an extremely powerful probability model, Gaussian mixture model (GMM) has been widely used in fields of pattern recognition, information processing and data mining. If the number of the Gaussians in the mixture is pre-known, the well-known Expectation-Maximization (EM) algorithm could be used to estimate the parameters in the Gaussian mixture model. However, in many practical applications, the number of the components is not known.Then the Gaussian mixture modeling becomes a compound problem of the determination of number of Gaussian components and the parameter estimation for the mixture, which is rather difficult. In this paper, we propose a split and merge EM (SMEM) algorithm to decide the number of the components, which is referred to the model selection for the mixture. Based on the minimum description length (MDL) criterion, the proposed SMEM algorithm can avoid the local optimum drawback of the usual EM algorithm and determine the number of components in the Gaussian mixture model automatically. By splitting and merging the uncorrect components, the algorithm can converge to the maximization of the MDL criterion function and get a better parameter estimation of the Gaussian mixture with correct number of components in the mixture. It is demonstrated well by the experiments that the proposed split and merge EM algorithm can make both parameter learning and model selection efficiently for Gaussian mixture.
Merge (version control)
Minimum description length
Mixture theory
Cite
Citations (32)
In this paper, we use the mean shift procedure to determine the number of components in a mixture model and to detect their modes of each mixture component. Next, we have adopted the Gaussian mixture model to represent the probability distribution of feature vectors. A deterministic annealing expectation maximization algorithm is used to estimate the parameters of the GMM. The experimental results show that the mean shift part of the proposed algorithm is efficient to determine the number of components and modes of each component in mixture models. And it shows that the DAEM part provides a global optimal solution for the parameter estimation in a mixture model.
Mean-shift
Component (thermodynamics)
Adaptive simulated annealing
Feature (linguistics)
Cite
Citations (0)
Microarray allows us to efficiently analyse valuable gene expression data. In this paper we propose a effective methodology for analysis of microarrays. Earlier a new gridding algorithm is proposed to address all individual spots and to determine their borders. Then, a classical Gaussian Mixture Model (GMM) is used to analyse array spots more flexibly and adaptively. The Expectation Maximization (EM) algorithm is used to estimate GMM parameters by Maximum Likelihood (ML) approach. In this paper, we also addressing the problem of artifacts by detecting and compensate using GMM mixture components and artifacts data present in foreground and background spots are corrected by performing mathematical morphology and histogram analysis methods.
Spots
Mixture theory
Cite
Citations (6)
We describe $k$-MLE, a fast and efficient local search algorithm for learning finite statistical mixtures of exponential families such as Gaussian mixture models. Mixture models are traditionally learned using the expectation-maximization (EM) soft clustering technique that monotonically increases the incomplete (expected complete) likelihood. Given prescribed mixture weights, the hard clustering $k$-MLE algorithm iteratively assigns data to the most likely weighted component and update the component models using Maximum Likelihood Estimators (MLEs). Using the duality between exponential families and Bregman divergences, we prove that the local convergence of the complete likelihood of $k$-MLE follows directly from the convergence of a dual additively weighted Bregman hard clustering. The inner loop of $k$-MLE can be implemented using any $k$-means heuristic like the celebrated Lloyd's batched or Hartigan's greedy swap updates. We then show how to update the mixture weights by minimizing a cross-entropy criterion that implies to update weights by taking the relative proportion of cluster points, and reiterate the mixture parameter update and mixture weight update processes until convergence. Hard EM is interpreted as a special case of $k$-MLE when both the component update and the weight update are performed successively in the inner loop. To initialize $k$-MLE, we propose $k$-MLE++, a careful initialization of $k$-MLE guaranteeing probabilistically a global bound on the best possible complete likelihood.
Initialization
Cite
Citations (34)
The expectation-maximization (EM) algorithm is a popular approach for parameter estimation of finite mixture model (FMM). A drawback of this approach is that the number of components of the finite mixture model is not known in advance, nevertheless, it is a key issue for EM algorithms. In this paper, a penalized minimum matching distance-guided EM algorithm is discussed. Under the framework of Greedy EM, a fast and accurate algorithm for estimating the number of components of the Gaussian mixture model (GMM) is proposed. The performance of this algorithm is validated via simulative experiments of univariate and bivariate Gaussian mixture models.
Univariate
Cite
Citations (7)
개개인의 음성을 이용한 화자식별에서, 화자 모델을 추정하는데 가우시안 혼합모델이 주로 사용된다. 최대 우도 추정을 갖는 가우시안 혼합모델의 파라미터 추정은 Expectation-Maximisation (EM)을 사용하여 얻을 수 있다. 그러나, EM 알고리즘은 초기값에 상당히 민감하고, 혼합성분의 개수를 미리 알고 있어야 하는 단점이 있다. 본 논문에서는, EM 알고리즘의 문제점을 해결하기 위하여 가우시안 혼합모델을 위한 점진적 ${\cal}k-means$ 알고리즘에 의한 초기값을 갖는 EM 알고리즘을 제안한다. 제안된 방법은 혼합성분의 개수를 점진적 ${\cal}k-means$ 방법을 이용하여 한번에 하나씩 혼합성분을 추정하여 최적의 혼합성분이 얻어 질 때까지 이를 반복 수행한다. 하나의 혼합성분이 추가될 때마다, 새로 얻어진 혼합성분과 이전에 구한 혼합성분들간의 상호 관계를 각각 측정한다. 이로부터, 통계적으로 독립인 최적의 혼합성분 개수를 추정할 수 있다. 제안된 방법의 성능을 확인하기 위하여 임의의 생성 데이터와 실제 음성을 사용하였다. 실험 결과에서, 제안된 방법이 기존의 방법보다 화자 식별 성능이 우수하였으며, 또한 성능을 유지하면서도 계산량 감소의 효과까지 볼 수 있었다. 【Tn general. Gaussian mixture model (GMM) is used to estimate the speaker model from the speech for speaker identification. The parameter estimates of the GMM are obtained by using the Expectation-Maximization (EM) algorithm for the maximum likelihood (ML) estimation. However the EM algorithm has such drawbacks that it depends heavily on the initialization and it needs the number of mixtures to be known. In this paper, to solve the above problems of the EM algorithm. we propose an EM algorithm with the initialization based on incremental ${\cal}k-means$ for GMM. The proposed method dynamically increases the number of mixtures one by one until finding the optimum number of mixtures. Whenever adding one mixture, we calculate the mutual relationship between it and one of other mixtures respectively. Finally. based on these mutual relationships. we can estimate the optimal number of mixtures which are statistically independent. The effectiveness of the proposed method is shown by the experiment for artificial data. Also. we performed the speaker identification by applying the proposed method comparing with other approaches.】
Initialization
Maximization
Identification
Cite
Citations (0)
Bayesian Network (BN) is a model that applies Bayes principle with assumption that input variables can be interdependent. BN is described as a graph consisting of nodes and arcs. Node shows variables while arc shows relationship between nodes. Combined probability distribution between nodes in BN is built using gaussian mixture models (GMM) which is a type of density model consisting of components of Gaussian functions. There are 3 of mixture models, probability mixture model, parametric mixture model and continuous mixture. GMM parameters can be estimated using expectation maximization (EM) algorithm. EM algorithm is an iterative method that involves expectation (E-step) and maximization (M-step) and is often used to find estimated value of Maximum Likelihood (ML) of parameters in a probabilistic model, where the model also depends on unknown latent variables. E-step is calculating the expectation value of the log-likelihood function, while M-step maximizes the expected value of the log-likelihood function. Advantage of EM algorithm is that it can solve mixed function parameter estimation problems as well as parameters from incomplete data. EM algorithm can solve the log-likelihood function problem which is difficult to solve by simple analysis by assuming the existence of a value for an additional but hidden parameter.
Marginal likelihood
Cite
Citations (3)
Tn general. Gaussian mixture model (GMM) is used to estimate the speaker model from the speech for speaker identification. The parameter estimates of the GMM are obtained by using the Expectation-Maximization (EM) algorithm for the maximum likelihood (ML) estimation. However the EM algorithm has such drawbacks that it depends heavily on the initialization and it needs the number of mixtures to be known. In this paper, to solve the above problems of the EM algorithm. we propose an EM algorithm with the initialization based on incremental for GMM. The proposed method dynamically increases the number of mixtures one by one until finding the optimum number of mixtures. Whenever adding one mixture, we calculate the mutual relationship between it and one of other mixtures respectively. Finally. based on these mutual relationships. we can estimate the optimal number of mixtures which are statistically independent. The effectiveness of the proposed method is shown by the experiment for artificial data. Also. we performed the speaker identification by applying the proposed method comparing with other approaches.
Initialization
Identification
Speaker identification
Maximization
Cite
Citations (0)
Images produced in emission tomography with the expectation-maximization algorithm have been observed to become more noisy and to have large distortions near edges as iterations proceed and the images converge towards the maximum-likelihood estimate. It is our conclusion that these artifacts are fundamental to reconstructions based on maximum-likelihood estimation as it has been applied usually; they are not due to the use of the expectation-maximization algorithm, which is but one numerical approach for finding the maximum-likelihood estimate. In this paper, we develop a mathematical approach for suppressing both the noise and edge artifacts by modifying the maximum-likelihood approach to include constraints which the estimate must satisfy.
Maximization
Cite
Citations (418)
The expectation maximization algorithm has been classically used to find the maximum likelihood estimates of parameters in mixture probabilistic models.Problems of the EM algorithm are that parameters initialization depends on some prior knowledge,and it is easy to converge to a local maximum in the iteration process.In this paper,a new method of estimating the parameter of GMM based on split EM is proposed,it starts from a single mixture component,sequentially split and estimates the parameter of the mixture components during expectation maximization steps.Extensive experiments show the advantages and efficiency of the proposed method.
Initialization
Maximization
Component (thermodynamics)
Cite
Citations (5)