language-icon Old Web
English
Sign In

Canopy clustering algorithm

The canopy clustering algorithm is an unsupervised pre-clustering algorithm introduced by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000. It is often used as preprocessing step for the K-means algorithm or the Hierarchical clustering algorithm. It is intended to speed up clustering operations on large data sets, where using another algorithm directly may be impractical due to the size of the data set. The canopy clustering algorithm is an unsupervised pre-clustering algorithm introduced by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000. It is often used as preprocessing step for the K-means algorithm or the Hierarchical clustering algorithm. It is intended to speed up clustering operations on large data sets, where using another algorithm directly may be impractical due to the size of the data set. The algorithm proceeds as follows, using two thresholds T 1 {displaystyle T_{1}} (the loose distance) and T 2 {displaystyle T_{2}} (the tight distance), where T 1 > T 2 {displaystyle T_{1}>T_{2}} .

[ "Fuzzy clustering", "Correlation clustering", "Fixed-radius near neighbors", "Ball tree", "Biclustering", "Data stream clustering", "Conceptual clustering" ]
Parent Topic
Child Topic
    No Parent Topic