THE EFFECTS OF WITHIN‐GROUP COVARIANCE STRUCTURE ON RECOVERY IN CLUSTER ANALYSIS: II. EXTENSION TO THE P‐DIMENSIONAL CASE

1999 
A previous paper (Donoghue, 1995a) found that failure to account for within-group covariance structure can greatly affect the ability of commonly used cluster analysis procedures to recover known subgroup structure. This follow-up study used Monte Carlo methods to extend those results to the more general case of more than two groups and higher dimensional data. Data were generated according to a finite normal mixture model. Distance between group centroids, number of groups, relative group sizes, number of variables, and within-group covariance structure were varied. Each dataset was analyzed using 11 hierarchical clustering algorithms. The effects of within-group covariance structure were generalized, and proved even more vexing than had been found in the two-group, bivariate case. Of the clustering algorithms, flexible average clustering (β = -.20 or β = -.25) gave best cluster recovery, followed by Ward's method. These results are interpreted in terms of the weakness of Euclidean distance as a measure of inter-entity similarity. Based on preliminary results, some possible alternate measures of similarity are identified. However, more work is clearly needed before clear recommendations about the choice of similarity measure can be given.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    0
    Citations
    NaN
    KQI
    []