language-icon Old Web
English
Sign In

Earth mover's distance

In statistics, the earth mover's distance (EMD) is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric. Informally, if the distributions are interpreted as two different ways of piling up a certain amount of dirt over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be amount of dirt moved times the distance by which it is moved. In statistics, the earth mover's distance (EMD) is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric. Informally, if the distributions are interpreted as two different ways of piling up a certain amount of dirt over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be amount of dirt moved times the distance by which it is moved. The above definition is valid only if the two distributions have the same integral (informally, if the two piles have the same amount of dirt), as in normalized histograms or probability density functions. In that case, the EMD is equivalent to the 1st Mallows distance or 1st Wasserstein distance between the two distributions. Assume that we have a set of points in R d { extstyle mathbb {R} ^{d}} (dimension d { extstyle d} ). Instead of assigning one distribution to the set of points, we can cluster them and represent the point set in terms of the clusters. Thus, each cluster is a single point in R d { extstyle mathbb {R} ^{d}} and the weight of the cluster is decided by the fraction of the distribution present in that cluster. This representation of a distribution by a set of clusters is called the signature. Two signatures can have different sizes, for example, a bimodal distribution has shorter signature (2 clusters) than complex ones. One cluster representation (mean or mode in R d { extstyle mathbb {R} ^{d}} ) can be thought of as a single feature in a signature. The distance between each of the features is called as ground distance. Earth Mover's Distance can be formulated and solved as a transportation problem. Suppose that several suppliers, each with a given amount of goods, are required to supply several consumers, each with a given limited capacity. For each supplier-consumer pair, the cost of transporting a single unit of goods is given. The transportation problem is then to find a least-expensive flow of goods from the suppliers to the consumers that satisfies the consumers' demand. Similarly, here the problem is transforming one signature( P { extstyle P} ) to another( Q { extstyle Q} ) with minimum work done. Assume that signature P { extstyle P} has m { extstyle m} clusters with P = { ( p 1 , w p 1 ) , ( p 2 , w p 2 ) , . . . , ( p m , w p m ) } { extstyle P={(p_{1},w_{p1}),(p_{2},w_{p2}),...,(p_{m},w_{pm})}} , where p i { extstyle p_{i}} is the cluster representative and w p i { extstyle w_{pi}} is the weight of the cluster. Similarly another signature Q = { ( q 1 , w q 1 ) , ( q 2 , w q 2 ) , . . . , ( q n , w q n ) } { extstyle Q={(q_{1},w_{q1}),(q_{2},w_{q2}),...,(q_{n},w_{qn})}} has n { extstyle n} clusters. Let D = [ d i , j ] { extstyle D=} be the ground distance between clusters p i { extstyle p_{i}} and q j { extstyle q_{j}} . We want to find a flow F = [ f i , j ] { extstyle F=} , with f i , j { extstyle f_{i,j}} the flow between p i { extstyle p_{i}} and q j { extstyle q_{j}} , that minimizes the overall cost.

[ "Algorithm", "Machine learning", "Artificial intelligence", "Pattern recognition", "Computer vision" ]
Parent Topic
Child Topic
    No Parent Topic