Some Basic Elements in Clustering and Classification

2016 
This chapter deals with basic tools useful in clustering and classification and present some commonly used approaches for these two problems. Since several chapters in these proceedings are devoted to approaches to deal with classification, we give more attention in this chapter to clustering issues. We are first concerned with notions of distances or dissimilarities between objects we are to group in clusters. Then based on these inter-objects distances we define distances between sets of objects, such as single linkage, complete linkage or Ward distance. Three clustering algorithms are presented with some details and compared: Kmeans, Ascendant Hierarchical and DBSCAN algorithms. The comparison between partitions and the issue of choosing the correct number of clusters are investigated and the proposed procedures are tested on two data sets. We emphasize the fact that the results provided by the numerous indices available in the literature for selecting the number of clusters is largely depending upon the shape and the dispersion we are assuming for these clusters. Finally the last section is devoted to classification. Some basic notions such as training sets, test sets and cross-validation are discussed. Two particular approaches are detailed, the K-nearest neighbors method and the logistic regression, and comparisons with LDA (Linear Discriminant Analysis) and QDA (Quadratic Discriminant Analysis) are analyzed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []