language-icon Old Web
English
Sign In

Neighbourhood components analysis

Neighbourhood components analysis is a supervised learning method for classifying multivariate data into distinct classes according to a given distance metric over the data. Functionally, it serves the same purposes as the K-nearest neighbors algorithm, and makes direct use of a related concept termed stochastic nearest neighbours. Neighbourhood components analysis is a supervised learning method for classifying multivariate data into distinct classes according to a given distance metric over the data. Functionally, it serves the same purposes as the K-nearest neighbors algorithm, and makes direct use of a related concept termed stochastic nearest neighbours. Neighbourhood components analysis aims at 'learning' a distance metric by finding a linear transformation of input data such that the average leave-one-out (LOO) classification performance is maximized in the transformed space. The key insight to the algorithm is that a matrix A {displaystyle A} corresponding to the transformation can be found by defining a differentiable objective function for A {displaystyle A} , followed by use of an iterative solver such as conjugate gradient descent. One of the benefits of this algorithm is that the number of classes k {displaystyle k} can be determined as a function of A {displaystyle A} , up to a scalar constant. This use of the algorithm therefore addresses the issue of model selection. In order to define A {displaystyle A} , we define an objective function describing classification accuracy in the transformed space and try to determine A ∗ {displaystyle A^{*}} such that this objective function is maximized. A ∗ = argmax A f ( A ) {displaystyle A^{*}={mbox{argmax}}_{A}f(A)} Consider predicting the class label of a single data point by consensus of its k {displaystyle k} -nearest neighbours with a given distance metric. This is known as leave-one-out classification. However, the set of nearest-neighbours C i {displaystyle C_{i}} can be quite different after passing all the points through a linear transformation. Specifically, the set of neighbours for a point can undergo discrete changes in response to smooth changes in the elements of A {displaystyle A} , implying that any objective function f ( ⋅ ) {displaystyle f(cdot )} based on the neighbours of a point will be piecewise-constant, and hence not differentiable. We can resolve this difficulty by using an approach inspired by stochastic gradient descent. Rather than considering the k {displaystyle k} -nearest neighbours at each transformed point in LOO-classification, we'll consider the entire transformed data set as stochastic nearest neighbours. We define these using a softmax function of the squared Euclidean distance between a given LOO-classification point and each other point in the transformed space: p i j = { e − | | A x i − A x j | | 2 ∑ k e − | | A x i − A x k | | 2 , if j ≠ i 0 , if j = i {displaystyle p_{ij}={egin{cases}{frac {e^{-||Ax_{i}-Ax_{j}||^{2}}}{sum _{k}e^{-||Ax_{i}-Ax_{k}||^{2}}}},&{mbox{if}}j eq i\0,&{mbox{if}}j=iend{cases}}} The probability of correctly classifying data point i {displaystyle i} is the probability of classifying the points of each of its neighbours with the same class C i {displaystyle C_{i}} : p i = ∑ j ∈ C i p i j {displaystyle p_{i}=sum _{jin C_{i}}p_{ij}quad } where p i j {displaystyle p_{ij}} is the probability of classifying neighbour j {displaystyle j} of point i {displaystyle i} .

[ "Nonlinear conjugate gradient method", "Stochastic gradient descent" ]
Parent Topic
Child Topic
    No Parent Topic