Exploring the effect of data reduction on Neural Network and Support Vector Machine classification

2017 
Abstract Neural Networks and Support Vector Machines (SVMs) are two of the most popular and efficient supervised classification models. However, in the context of large datasets many complexity issues arise due to high memory requirements and high computational cost. In the context of the application of Data Mining algorithms, data reduction techniques attempt to reduce the size of training datasets in terms of the number of instances by selecting some of the existing instances or by generating new training instances. The idea is to speed up the application of the data mining algorithm with minimum or no sacrifice in performance. Data reduction techniques have been extensively used in the context of k -Nearest Neighbor classification, a lazy classifier that works by directly using a training dataset rather than building a model. This paper explores the application of data reduction techniques as a preprocessing step before the training step of Neural Networks and SVMs. Furthermore, the paper proposes a new data reduction technique that is based on k-median clustering algorithm. Our experimental results illustrate that, in the case of SVMs, data reduction techniques can effectively reduce the dataset size incurring small performance degradation. In the case of Neural Networks, the performance loss is somewhat greater, for the same data reduction rate, but both SVM and Neural Network models outperform the k -NN approach that is typically used in Data Mining applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    11
    Citations
    NaN
    KQI
    []