language-icon Old Web
English
Sign In

Multidimensional scaling

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate 'information about the pairwise 'distances' among a set of n objects or individuals' into a configuration of n points mapped into an abstract Cartesian space. S t r e s s D ( x 1 , x 2 , . . . , x N ) = ( ∑ i ≠ j = 1 , . . . , N ( d i j − ‖ x i − x j ‖ ) 2 ) 1 / 2 {displaystyle Stress_{D}(x_{1},x_{2},...,x_{N})={Biggl (}sum _{i eq j=1,...,N}{igl (}d_{ij}-|x_{i}-x_{j}|{igr )}^{2}{Biggr )}^{1/2}} S t r e s s = ∑ ( f ( x ) − d ) 2 ∑ d 2 {displaystyle Stress={sqrt {frac {sum {igl (}f(x)-d{igr )}^{2}}{sum d^{2}}}}} Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate 'information about the pairwise 'distances' among a set of n objects or individuals' into a configuration of n points mapped into an abstract Cartesian space. More technically, MDS refers to a set of related ordination techniques used in information visualization, in particular to display the information contained in a distance matrix. It is a form of non-linear dimensionality reduction. Given a distance matrix with the distances between each pair of objects in a set, and a chosen number of dimensions, N, an MDS algorithm places each object into N-dimensional space such that the between-object distances are preserved as well as possible. If N is one or two, then 2D scatter plots of the resulting points are possible . MDS algorithms fall into a taxonomy, depending on the meaning of the input matrix: It is also known as Principal Coordinates Analysis (PCoA), Torgerson Scaling or Torgerson–Gower scaling. It takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain. For example, given the aerial distances between many cities in a matrix D = [ d i j ] { extstyle D=} , where d i j { extstyle d_{ij}} is the distance between the coordinates of i t h { extstyle i^{th}} and j t h { extstyle j^{th}} city, given by d i j = ( x i − x j ) 2 + ( y i − y j ) 2 { extstyle d_{ij}={sqrt {(x_{i}-x_{j})^{2}+(y_{i}-y_{j})^{2}}}} , you want to find the coordinates of the cities. This problem is addressed in classical MDS. General forms of loss functions called Stress in distance MDS and Strain in classical MDS. The strain is given by: S t r a i n D ( x 1 , x 2 , . . . , x N ) = ( ∑ i , j ( b i j − ⟨ x i , x j ⟩ ) 2 ∑ i , j b i j 2 ) 1 / 2 {displaystyle Strain_{D}(x_{1},x_{2},...,x_{N})={Biggl (}{frac {sum _{i,j}{igl (}b_{ij}-langle x_{i},x_{j} angle {igr )}^{2}}{sum _{i,j}b_{ij}^{2}}}{Biggr )}^{1/2}} , where b i j {displaystyle b_{ij}} are the terms of the matrix B {displaystyle B} defined on step 2 of the following algorithm. It is a superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often minimized using a procedure called stress majorization. Metric MDS minimizes the cost function called “Stress” which is a residual sum of squares: : or, S t r e s s D ( x 1 , x 2 , . . . , x N ) = ( ∑ i , j ( d i j − ‖ x i − x j ‖ ) 2 ∑ i , j d i j 2 ) 1 / 2 {displaystyle Stress_{D}(x_{1},x_{2},...,x_{N})={Biggl (}{frac {sum _{i,j}{igl (}d_{ij}-|x_{i}-x_{j}|{igr )}^{2}}{sum _{i,j}d_{ij}^{2}}}{Biggr )}^{1/2}} In contrast to metric MDS, non-metric MDS finds both a non-parametric monotonic relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space. The relationship is typically found using isotonic regression: let x { extstyle x} denote the vector of proximities, f ( x ) { extstyle f(x)} a monotonic transformation of x { extstyle x} , and d { extstyle d} the point distances; then coordinates have to be found, that minimize the so-called stress,

[ "Ecology", "Statistics", "Machine learning", "Artificial intelligence", "multidimensional scaling analysis", "Sammon mapping" ]
Parent Topic
Child Topic
    No Parent Topic