Multi-view heterogeneous fusion and embedding for categorical attributes on mixed data

2019 
Categorical attributes are ubiquitous in real-world collected data. However, such attributes lack a well-defined distance metric and cannot be directly manipulated per algebraic operations, so many data mining algorithms are unable to work directly on them. Learning an appropriate metric or an effective numerical embedding is very vital yet challenging, for categorical attributes with multi-view heterogeneous data characteristics. This paper proposes a novel multi-view heterogeneous fusion model (MVHF), which first captures basic coupling information for each view and then fuses these heterogeneous information from different views by multi-kernel metric learning, to measure the intrinsic distances between this type of categorical attributes; based on these measured distances, further, we use the manifold learning method to learn a high-quality numerical embedding for each categorical value. Experiments on 33 mixed data sets demonstrate that MVHF-enabled classification significantly enhances the performance, compared with state-of-the-art distance metrics or embedding competitors.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    50
    References
    2
    Citations
    NaN
    KQI
    []