THE EFFECTS OF WITHIN‐GROUP COVARIANCE STRUCTURE ON RECOVERY IN CLUSTER ANALYSIS: II. EXTENSION TO THE P‐DIMENSIONAL CASE

John R. Donoghue

THE EFFECTS OF WITHIN‐GROUP COVARIANCE STRUCTURE ON RECOVERY IN CLUSTER ANALYSIS: II. EXTENSION TO THE P‐DIMENSIONAL CASE

1999

John R. Donoghue

A previous paper (Donoghue, 1995a) found that failure to account for within-group covariance structure can greatly affect the ability of commonly used cluster analysis procedures to recover known subgroup structure. This follow-up study used Monte Carlo methods to extend those results to the more general case of more than two groups and higher dimensional data. Data were generated according to a finite normal mixture model. Distance between group centroids, number of groups, relative group sizes, number of variables, and within-group covariance structure were varied. Each dataset was analyzed using 11 hierarchical clustering algorithms. The effects of within-group covariance structure were generalized, and proved even more vexing than had been found in the two-group, bivariate case. Of the clustering algorithms, flexible average clustering (β = -.20 or β = -.25) gave best cluster recovery, followed by Ward's method. These results are interpreted in terms of the weakness of Euclidean distance as a measure of inter-entity similarity. Based on preliminary results, some possible alternate measures of similarity are identified. However, more work is clearly needed before clear recommendations about the choice of similarity measure can be given.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations