Improved support vector machine algorithm for heterogeneous data

2015 
A support vector machine (SVM) is a popular algorithm for classification learning. The classical SVM effectively manages classification tasks defined by means of numerical attributes. However, both numerical and nominal attributes are used in practical tasks and the classical SVM does not fully consider the difference between them. Nominal attributes are usually regarded as numerical after coding. This may deteriorate the performance of learning algorithms. In this study, we propose a novel SVM algorithm for learning with heterogeneous data, known as a heterogeneous SVM (HSVM). The proposed algorithm learns an mapping to embed nominal attributes into a real space by minimizing an estimated generalization error, instead of by direct coding. Extensive experiments are conducted, and some interesting results are obtained. The experiments show that HSVM improves classification performance for both nominal and heterogeneous data. HighlightsWe propose an algorithm to map nominal features to a numerical space via minimizing estimated generalization errors.We integrate the mapping algorithm with support vector machines and result in an improved learning algorithm from heterogeneous data.Experiments show the proposed technique is effective for learning with heterogeneous data and also help deal with imbalanced tasks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    46
    References
    27
    Citations
    NaN
    KQI
    []