Simplex Based Vector Mapping for Categorical Attributes Clustering

2018 
When clustering unlabeled data, categorical attributes are usually treated differently from numerical attributes because of their unique characteristics, which introduces difficulties in clustering data with both types of attributes. In this paper, we propose a strategy to map categorical attributes to high dimensional vectors based on the Simplex Theory, hence categorical attributes could be handled the same as numeral attributes. To achieve identical distances between any two values under Euclidean distance, we theoretically prove a categorical attribute with n types of values should be mapped to at least n--1 dimensional vectors. Furthermore, numerical vector mapping solutions are provided on condition of 0 normalized constraint. Experimentally, we show that integrating our vector mapping strategy with K-means algorithm achieves better accuracy than integrating similarities for categorical attributes with K-modes algorithm on four datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []