The Effect of Parallelism on Data Reduction

Pavlos Ponos,Stefanos Ougiaroglou,Georgios Evangelidis

The Effect of Parallelism on Data Reduction

2019

In this paper, we investigate the effect of parallelism on two data reduction algorithms that use k-Means clustering in order to find homogeneous clusters in the training set. By homogeneous, we refer to clusters where all instances belong to the same class label. Our approach divides the training set into subsets and applies the data reduction algorithm on each separate subset in parallel. Then, the reduced subsets are merged back to the final reduced set. In our experimental study, we split the datasets into 8, 16, 32 and 64 subsets. The results obtained reveal that parallelism can achieve very low preprocessing costs. Also, when the number of subsets is high, in some datasets the accuracy of k-NN classification is almost equal (if not better) to the one achieved when using the standard execution of the reduction algorithms, with a small loss in the reduction rate.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations