A Kernel PCA Radial Basis Function Neural Networks and Application
3
Citation
5
Reference
10
Related Paper
Citation Trend
Abstract:
This paper reviewed the classical principal components analysis methods for multivariate data analysis and feature extraction in pattern classification. A kernel-based extension to the classical PCA models was discussed to cope with nonlinear data dependencies. Kernel PCA was implicitly performing a linear PCA in some high-dimensional kernel feature space that was nonlinearly related to input space by using a suitable nonlinear kernel function mapping. And then, the conjunction of kernel PCA method and RBF neural networks was proposed in practical and algorithmic considerations. Finally, we illustrate the usefulness of kernel PCA algorithms by discussing kernel PCA RBF neural networks application in handwritten digit classificationKeywords:
Kernel (algebra)
String kernel
Kernel learning is a challenging issue which has been vastly investigated over the last decades. The performance of kernel-based methods broadly relies on selecting an appropriate kernel. In machine learning community, a fundamental problem is how to model a suitable kernel. The traditional kernels, e.g., Gaussian kernel and polynomial kernel, are not adequately flexible to employ the information of the given data. Classical kernels are unable to sufficiently depict the characteristics of data similarities. To alleviate this problem, this paper presents a Flexible Kernel by Negotiating between Data-dependent kernel learning and Task-dependent kernel learning termed as FKNDT. Our method learns a suitable kernel by way of the Hadamard product of two types of kernels; a data-dependent kernel and a set of pre-specified classical kernels as a task-dependent kernel. We evaluate a flexible kernel in a supervised manner via Support Vector Machines (SVM). We model a learning process as a joint optimization problem including data-dependent kernel matrix learning, multiple kernel learning by means of quadratic programming, and standard SVM optimization. The experimental results demonstrate our technique provides a more effective kernel than the traditional kernels. Our method is better than other state-of-the-art kernel-based algorithms in terms of classification accuracy on fifteen benchmark datasets.
Tree kernel
String kernel
Kernel (algebra)
Multiple kernel learning
Graph kernel
Cite
Citations (1)
Kernel techniques became popular due to and along with the rising success of Support Vector Machines (SVM). During the last two decades, the kernel idea itself has been extracted from SVM and is now widely studied as an independent subject. Essentially, kernel methods are nonlinear transformation techniques that take data from an input set to a high (possibly infinite) dimensional vector space, called the Reproducing Kernel Hilbert Space (RKHS), in which linear models can be applied. The original input set could be data from different domains and applications, such as tweets, ratings of movies, images, medical measurements, etc. The two spaces are connected by a Positive-Semi Definite (PSD) kernel function and all computations in the RKHS are evaluated on the low dimensional input set using the kernel function.
Kernel methods are proven to be efficient on various applications. However, the computational complexity of most kernel algorithms typically grows cubically, or at least quadratically, with respect to the training size. This is due to the fact that a Gram kernel matrix needs to be constructed and/or inverted. To improve the scalability for large-scale training, kernel approximation techniques are employed, where the kernel matrix is assumed to have a low-rank structure. Essentially, this is equivalent to assuming a subspace model spanned by a subset of the training data in the RKHS. The task is hence to estimate the subspace with respect to some criteria, such as the reconstruction error, the discriminative power for classification tasks, etc.
Based on these motivations, this thesis focuses on the development of scalable kernel techniques for supervised classification problems. Inspired by the idea of the subspace classifier and kernel clustering models, we have proposed the CLAss-specific Subspace Kernel (CLASK) representation, where class-specific kernel functions are applied and individual subspaces can be constructed accordingly. In this thesis work, an automatic model selection technique is proposed to choose the best multiple kernel functions for each class based on a criterion using the subspace projection distance. Moreover, subset selection and transformation techniques using CLASK are developed to further reduce the model complexity with an enhanced discriminative power for kernel approximation and classification. Furthermore, we have also proposed both a parallel and a sequential framework to tackle large-scale learning problems.
Kernel (algebra)
String kernel
Tree kernel
Discriminative model
Cite
Citations (0)
Tree kernel
Kernel (algebra)
String kernel
Multiple kernel learning
Cite
Citations (0)
In kernel methods such as kernel principal component analysis (PCA) and support vector machines, the so called kernel trick is used to avoid direct calculations in a high (virtually infinite) dimensional kernel space. In this brief, based on the fact that the effective dimensionality of a kernel space is less than the number of training samples, we propose an alternative to the kernel trick that explicitly maps the input data into a reduced dimensional kernel space. This is easily obtained by the eigenvalue decomposition of the kernel matrix. The proposed method is named as the nonlinear projection trick in contrast to the kernel trick. With this technique, the applicability of the kernel methods is widened to arbitrary algorithms that do not use the dot product. The equivalence between the kernel trick and the nonlinear projection trick is shown for several conventional kernel methods. In addition, we extend PCA-L1, which uses L1-norm instead of L2-norm (or dot product), into a kernel version and show the effectiveness of the proposed approach.
Kernel (algebra)
String kernel
Tree kernel
Kernel smoother
Cite
Citations (50)
SVM (Support Vector Machines) is the most advanced machine learning algorithm in the field of pattern recognition. The selection of kernel functions will have a direct impact on the performance of SVM. This paper analyzed Linear kernel function, Polynomial kernel function, Radial basis function (RBF), Sigmoid kernel function, Fourier kernel function, B-spline kernel function and Wavelet kernel function, seven types of common kernel functions, and it adopted a new kernel function-compound kernel function. The novel kernel function combines three types of common kernel functions and has better generalization ability and better learning ability. Experimental results show the superiority of the compound kernel function.
Kernel (algebra)
String kernel
Tree kernel
Kernel smoother
Sigmoid function
Cite
Citations (33)
Kernel (algebra)
Tree kernel
Kernel regression
String kernel
Kernel smoother
Graph kernel
Principal component regression
Cite
Citations (4)
Sparse representation classification (SRC) and kernel method have been successfully used in pattern recognition. On account of the limitations of the single kernel function, we proposed multiple kernel sparse classification method in face recognition to improve human face recognition rate. The Power kernel function has a good stability, and the Gaussian kernel function has good practicability. The Power kernel function and Gaussian kernel function are linearly combined. Through the transformation of different kernel space, we effectively extract the nonlinear structure information of the human face. Many experimental results show that the multiple kernel sparse representation classification algorithms that based on Power kernel function and Gaussian kernel function have higher recognition rate than that only using the single kernel sparse representation classification.
Kernel (algebra)
String kernel
Tree kernel
Cite
Citations (0)
In the paper, the formation conditions and the characteristics of kernel functions are researched and analysed which are used in kernel principal component analysis algorithm. Kernel principal component analysis algorithm is a new statistic signal processing technique which can extract nonlinear features of images. Kernel functions are key elements for improving it's performance. A new kernel function-combination kernel function is proposed. The new kernel function combines a local kernel function with a global kernel function. The local kernel is conditionally positive definite kernel which can extract local features of images. The global kernel function is polynomial kernel function which can extract global features of images. So the new kernel function can extract not only local features but also global features of images. The new kernel function is applied in kernel principal component analysis for extracting features of images. The test images are MNIST handwriting digits and ORL face database. Features of images are extracting by kernel principal component analysis firstly. Then performing classification by using linear support vector machines, the effect of the new kernel and that of other kernel on kernel principal component analysis are compared. The experiment results indicate the new kernel function certainly improves the performance of kernel principal component analysis.
Kernel (algebra)
String kernel
Tree kernel
Principal component regression
Cite
Citations (4)
Kernel methods have been successfully applied to the areas of pattern recognition and data mining. In this paper, we mainly discuss the issue of propagating labels in kernel space. A Kernel-Induced Label Propagation (Kernel-LP) framework by mapping is proposed for high-dimensional data classification using the most informative patterns of data in kernel space. The essence of Kernel-LP is to perform joint label propagation and adaptive weight learning in a transformed kernel space. That is, our Kernel-LP changes the task of label propagation from the commonly-used Euclidean space in most existing work to kernel space. The motivation of our Kernel-LP to propagate labels and learn the adaptive weights jointly by the assumption of an inner product space of inputs, i.e., the original linearly inseparable inputs may be mapped to be separable in kernel space. Kernel-LP is based on existing positive and negative LP model, i.e., the effects of negative label information are integrated to improve the label prediction power. Also, Kernel-LP performs adaptive weight construction over the same kernel space, so it can avoid the tricky process of choosing the optimal neighborhood size suffered in traditional criteria. Two novel and efficient out-of-sample approaches for our Kernel-LP to involve new test data are also presented, i.e., (1) direct kernel mapping and (2) kernel mapping-induced label reconstruction, both of which purely depend on the kernel matrix between training set and testing set. Owing to the kernel trick, our algorithms will be applicable to handle the high-dimensional real data. Extensive results on real datasets demonstrate the effectiveness of our approach.
Kernel (algebra)
Tree kernel
String kernel
Graph kernel
Cite
Citations (40)
Tree kernel
Kernel (algebra)
String kernel
Multiple kernel learning
Cite
Citations (16)