language-icon Old Web
English
Sign In

Jaccard index

The Jaccard index, also known as Intersection over Union and the Jaccard similarity coefficient (originally given the French name coefficient de communauté by Paul Jaccard), is a statistic used for gauging the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets: The Jaccard index, also known as Intersection over Union and the Jaccard similarity coefficient (originally given the French name coefficient de communauté by Paul Jaccard), is a statistic used for gauging the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets: (If A and B are both empty, we define J(A,B) = 1.) The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union: An alternate interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference A △ B = ( A ∪ B ) − ( A ∩ B ) {displaystyle A riangle B=(Acup B)-(Acap B)} to the union. This distance is a metric on the collection of all finite sets. There is also a version of the Jaccard distance for measures, including probability measures. If μ {displaystyle mu } is a measure on a measurable space X {displaystyle X} , then we define the Jaccard coefficient by J μ ( A , B ) = μ ( A ∩ B ) μ ( A ∪ B ) {displaystyle J_{mu }(A,B)={{mu (Acap B)} over {mu (Acup B)}}} , and the Jaccard distance by d μ ( A , B ) = 1 − J μ ( A , B ) = μ ( A △ B ) μ ( A ∪ B ) {displaystyle d_{mu }(A,B)=1-J_{mu }(A,B)={{mu (A riangle B)} over {mu (Acup B)}}} . Care must be taken if μ ( A ∪ B ) = 0 {displaystyle mu (Acup B)=0} or ∞ {displaystyle infty } , since these formulas are not well defined in these cases. The MinHash min-wise independent permutations locality sensitive hashing scheme may be used to efficiently compute an accurate estimate of the Jaccard similarity coefficient of pairs of sets, where each set is represented by a constant-sized signature derived from the minimum values of a hash function. Given two objects, A and B, each with n binary attributes, the Jaccard coefficient is a useful measure of the overlap that A and B share with their attributes. Each attribute of A and B can either be 0 or 1. The total number of each combination of attributes for both A and B are specified as follows:

[ "Ecology", "Botany", "Statistics", "Artificial intelligence", "Mathematical analysis", "MinHash" ]
Parent Topic
Child Topic
    No Parent Topic