Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon which computes a family of embeddings of a data set into Euclidean space (often low-dimensional) whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the 'diffusion distance' between probability distributions centered at those points. Different from linear dimensionality reduction methods such as principal component analysis (PCA) and multi-dimensional scaling (MDS), diffusion maps is part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive. Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon which computes a family of embeddings of a data set into Euclidean space (often low-dimensional) whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the 'diffusion distance' between probability distributions centered at those points. Different from linear dimensionality reduction methods such as principal component analysis (PCA) and multi-dimensional scaling (MDS), diffusion maps is part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive. Following and , diffusion maps can be defined in four steps. Diffusion maps exploit the relationship between heat diffusion and random walk Markov chain. The basic observation is that if we take a random walk on the data, walking to a nearby data-point is more likely than walking to another that is far away. Let ( X , A , μ ) {displaystyle (X,{mathcal {A}},mu )} be a measure space, where X {displaystyle X} is the data set and μ {displaystyle mu } represents the distribution on the points on X {displaystyle X} . Based on this, the connectivity k {displaystyle k} between two data points, x {displaystyle x} and y {displaystyle y} , can be defined as the probability of walking from x {displaystyle x} to y {displaystyle y} in one step of the random walk. Usually, this probability is specified in terms of a kernel function of the two points: k : X × X → R {displaystyle k:X imes X ightarrow mathbb {R} } . For example, the popular Gaussian kernel: More generally, the kernel function has the following properties ( k {displaystyle k} is symmetric) ( k {displaystyle k} is positivity preserving). The kernel constitutes the prior definition of the local geometry of the data-set. Since a given kernel will capture a specific feature of the data set, its choice should be guided by the application that one has in mind. This is a major difference with methods such as principal component analysis, where correlations between all data points are taken into account at once. Given ( X , k ) {displaystyle (X,k)} , we can then construct a reversible Markov chain on X {displaystyle X} (a process known as the normalized graph Laplacian construction):