An association-based dissimilarity measure for categorical data

Si Quang Le,Tu Bao Ho

An association-based dissimilarity measure for categorical data

2005

Si Quang Le
Tu Bao Ho

In this paper, we propose a novel method to measure the dissimilarity of categorical data. The key idea is to consider the dissimilarity between two categorical values of an attribute as a combination of dissimilarities between the conditional probability distributions of other attributes given these two values. Experiments with real data show that our dissimilarity estimation method improves the accuracy of the popular nearest neighbor classifier.

Keywords:

Probability distribution
Pattern recognition
Mathematics
Statistical hypothesis testing
Artificial intelligence
Classifier (linguistics)
Categorical variable
Statistics
k-nearest neighbors algorithm
Conditional probability
Conditional probability distribution
nearest neighbor classifier
Machine learning

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations