Design and Analysis of Parallel MapReduce based KNN-join Algorithm for Big Data Classification
2014
In data mining applications, multi-label classification is highly required in many modern applications. Meanwhile, a useful data mining approach is the k-nearest neighbour join, which has high accuracy but time-consuming process. With recent explosion of big data, conventional serial KNN join based multi-label classification algorithm needs to spend a lot of time to handle high volumn of data. To address this problem, we first design a parallel MapReduce based KNN join algorithm for big data classification. We further implement the algorithm using Hadoop in a cluster with 9 vitual machines. Experiment results show that our MapReduce based KNN join exhibits much higher performance than the serial one. Several interesting phenomenon are observed from the experiment results.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
6
Citations
NaN
KQI