Design and Analysis of Parallel MapReduce based KNN-join Algorithm for Big Data Classification

Xuesong Yan

Design and Analysis of Parallel MapReduce based KNN-join Algorithm for Big Data Classification

2014

Xuesong Yan

In data mining applications, multi-label classification is highly required in many modern applications. Meanwhile, a useful data mining approach is the k-nearest neighbour join, which has high accuracy but time-consuming process. With recent explosion of big data, conventional serial KNN join based multi-label classification algorithm needs to spend a lot of time to handle high volumn of data. To address this problem, we first design a parallel MapReduce based KNN join algorithm for big data classification. We further implement the algorithm using Hadoop in a cluster with 9 vitual machines. Experiment results show that our MapReduce based KNN join exhibits much higher performance than the serial one. Several interesting phenomenon are observed from the experiment results.

Keywords:

Big data
Algorithm
Data mining
Phenomenon
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations