Principal Component Analysis versus Factor Analysis
0
Citation
13
Reference
10
Related Paper
Abstract:
The article discusses selected problems related to both principal component analysis (PCA) and factor analysis (FA). In particular, both types of analysis were compared. A vector interpretation for both PCA and FA has also been proposed. The problem of determining the number of principal components in PCA and factors in FA was discussed in detail. A new criterion for determining the number of factors and principal components is discussed, which will allow to present most of the variance of each of the analyzed primary variables. An efficient algorithm for determining the number of factors in FA, which complies with this criterion, was also proposed. This algorithm was adapted to find the number of principal components in PCA. It was also proposed to modify the PCA algorithm using a new method of determining the number of principal components. The obtained results were discussed.Keywords:
Sparse PCA
Factor Analysis
Principal component analysis (PCA) is widely used for feature extraction and dimension reduction in pattern recognition and data analysis. Despite its popularity, the reduced dimension obtained from the PCA is difficult to interpret due to the dense structure of principal loading vectors. To address this issue, several methods have been proposed for sparse PCA, all of which estimate loading vectors with few non-zero elements. However, when more than one principal component is estimated, the associated loading vectors do not possess the same sparsity pattern. Therefore, it becomes difficult to determine a small subset of variables from the original feature space that have the highest contribution in the principal components. To address this issue, an adaptive block sparse PCA method is proposed. The proposed method is guaranteed to obtain the same sparsity pattern across all principal components. Experiments show that applying the proposed sparse PCA method can help improve the performance of feature selection for image processing applications. We further demonstrate that our proposed sparse PCA method can be used to improve the performance of blind source separation for functional magnetic resonance imaging data.
Sparse PCA
Feature vector
Feature (linguistics)
Cite
Citations (54)
The significant amount of variance in head-related transfer functions (HRTFs) resulting from source location and subject dependencies have led researchers to use principal components analysis (PCA) to approximate HRTFs with a small set of basis functions. PCA minimizes a mean-square error, and consequently may spend modeling effort on perceptually irrelevant properties. To investigate the extent of this effect, PCA performance was studied before and after removal of perceptually irrelevant variance. The results indicate that from the sixth PCA component onward, a substantial amount of perceptually irrelevant variance is being accounted for.
Explained variation
Variance components
Cite
Citations (5)
In analysis of bioinformatics data, a unique challenge arises from the high dimensionality of measurements. Without loss of generality, we use genomic study with gene expression measurements as a representative example but note that analysis techniques discussed in this article are also applicable to other types of bioinformatics studies. Principal component analysis (PCA) is a classic dimension reduction approach. It constructs linear combinations of gene expressions, called principal components (PCs). The PCs are orthogonal to each other, can effectively explain variation of gene expressions, and may have a much lower dimensionality. PCA is computationally simple and can be realized using many existing software packages. This article consists of the following parts. First, we review the standard PCA technique and their applications in bioinformatics data analysis. Second, we describe recent ‘non-standard’ applications of PCA, including accommodating interactions among genes, pathways and network modules and conducting PCA with estimating equations as opposed to gene expressions. Third, we introduce several recently proposed PCA-based techniques, including the supervised PCA, sparse PCA and functional PCA. The supervised PCA and sparse PCA have been shown to have better empirical performance than the standard PCA. The functional PCA can analyze time-course gene expression data. Last, we raise the awareness of several critical but unsolved problems related to PCA. The goal of this article is to make bioinformatics researchers aware of the PCA technique and more importantly its most recent development, so that this simple yet effective dimension reduction technique can be better employed in bioinformatics data analysis.
Sparse PCA
Generality
Cite
Citations (204)
Factor analysis and principal component analysis (PCA) are mathematically related: they both rely on calculating eigenvectors (on a correlation matrix or on a covariance matrix of normalized data), both are data reduction techniques that help to reduced the dimensionality of the data and outputs will look very much the same. Despite similarities, factor analysis and PCA solve different problems. PCA is a linear combination of variables (so that the principal components are orthogonal); factor analysis is a measurement model of a latent variable. PCA is a data reduction technique that calculates new variables from the set of the measured variables. A factor analysis also will lead to data reduction, but it answers a fundamentally different question. Factor analysis is a model that tries to identify a “latent variable.”.
Factor Analysis
Sparse PCA
Factoring
Canonical correlation
Data set
Factor (programming language)
Data Reduction
Matrix (chemical analysis)
Cite
Citations (0)
Principal component analysis (PCA) is a widespread exploratory data analysis tool. Sparse principal component analysis (SPCA) is a method that improves upon PCA by increasing the number of zeros in the loading vectors of PCA results. This makes the results more understandable and more usable. This bachelor's thesis introduces both methods, and includes examples using both real-world data and artifcial data. Also, the behavior of PCA under departure from weakly stationary data is explored.
Sparse PCA
Exploratory data analysis
USable
Component (thermodynamics)
Cite
Citations (2)
Abstract Principal Component Analysis (PCA) is the main method of dimension reduction and data processing when the dataset is of high dimension. Therefore, PCA is a widely used method in almost all scientific fields. Because PCA is a linear combination of the original variables, the interpretation process of the analysis results is often encountered with some difficulties. The approaches proposed for solving these problems are called to as Sparse Principal Component Analysis (SPCA). Sparse approaches are not robust in existence of outliers in the data set. In this study, the performance of the approach proposed by Croux et al. (2013), which combines the advantageous properties of SPCA and Robust Principal Component Analysis (RPCA), will be examined through one real and three artificial datasets in the situation of full sparseness. In the light of the findings, it is recommended to use robust sparse PCA based on projection pursuit in analyzing the data. Another important finding obtained from the study is that the BIC and TPO criteria used in determining lambda are not much superior to each other. We suggest choosing one of these two criteria that give an optimal result.
Sparse PCA
Projection pursuit
Cite
Citations (1)
The recently referred sparse principal component analysis(S-PCA)is a method of multivariate statistical analysis,which has been used in date processing and dimensionality reduction successfully.In this paper,we point out the advantage of sparse principal component analysis,and give all kinds of algorithms to solve sparse principal component.Finally,we introduce various S-PCA to comprehensive evaluation and explain the efficiency on the basis of examples.
Sparse PCA
Component (thermodynamics)
Component analysis
Cite
Citations (0)
Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA suffers from the fact that each principal component is a linear combination of all the original variables, thus it is often difficult to interpret the results. We introduce a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings. We first show that PCA can be formulated as a regression-type optimization problem; sparse loadings are then obtained by imposing the lasso (elastic net) constraint on the regression coefficients. Efficient algorithms are proposed to fit our SPCA models for both regular multivariate data and gene expression arrays. We also give a new formula to compute the total variance of modified principal components. As illustrations, SPCA is applied to real and simulated data with encouraging results.
Sparse PCA
Elastic net regularization
Lasso
Principal component regression
Cite
Citations (3,042)
Principal component analysis (PCA) has been widely used for data dimension reduction and process fault detection. However, interpreting the principal components and the outcomes of PCA-based monitoring techniques is a challenging task since each principal component is a linear combination of the original variables which can be numerous in most modern applications. To address this challenge, we first propose the use of sparse principal component analysis (SPCA) where the loadings of some variables in principal components are restricted to zero. This paper then describes a technique to determine the number of non-zero loadings in each principal component. Furthermore, we compare the performance of PCA and SPCA in fault detection. The validity and potential of SPCA are demonstrated through simulated data and a comparative study with the benchmark Tennessee Eastman process.
Sparse PCA
Benchmark (surveying)
Cite
Citations (13)
주성분 분석(principal component analysis; PCA)은 서로 상관되어 있는 다변량 자료의 차원을 축소하는 대표적인 기법으로 많은 다변량 분석에서 활용되고 있다. 하지만 주성분은 모든 변수들의 선형결합으로 이루어지므로, 그 결과의 해석이 어렵다는 한계가 있다. sparse PCA(SPCA) 방법은 elastic net 형태의 벌점함수를 이용하여 보다 성긴(sparse) 적재를 가진 수정된 주성분을 만들어주지만, 변수들의 그룹구조를 이용하지 못한다는 한계가 있다. 이에 본 연구에서는 기존 SPCA를 개선하여, 자료가 그룹화되어 있는 경우에 유의한 그룹을 선택함과 동시에 그룹 내 불필요한 변수를 제거할 수 있는 새로운 주성분 분석 방법을 제시하고자 한다. 그룹과 그룹 내 변수 구조를 모형 적합에 이용하기 위하여, sparse 주성분 분석에서의 elastic net 벌점함수 대신에 계층적 벌점함수 형태를 고려하였다. 또한 실제 자료의 분석을 통해 제안 방법의 성능 및 유용성을 입증하였다. Principal component analysis (PCA) describes the variation of multivariate data in terms of a set of uncorrelated variables. Since each principal component is a linear combination of all variables and the loadings are typically non-zero, it is difficult to interpret the derived principal components. Sparse principal component analysis (SPCA) is a specialized technique using the elastic net penalty function to produce sparse loadings in principal component analysis. When data are structured by groups of variables, it is desirable to select variables in a grouped manner. In this paper, we propose a new PCA method to improve variable selection performance when variables are grouped, which not only selects important groups but also removes unimportant variables within identified groups. To incorporate group information into model fitting, we consider a hierarchical lasso penalty instead of the elastic net penalty in SPCA. Real data analyses demonstrate the performance and usefulness of the proposed method.
Sparse PCA
Elastic net regularization
Lasso
Penalty Method
Uncorrelated
Data set
Cite
Citations (0)