Parallelization support vector machine (SVM) solving method based on Hadoop

Lin Shang,Yubin Yang,Aibao Luo,gaoyang

Parallelization support vector machine (SVM) solving method based on Hadoop

2012

Lin Shang
Yubin Yang
Aibao Luo
gaoyang

The invention discloses a parallelization SVM solving method based on the Hadoop. The method includes the steps of storing data into a distributed cluster file system; executing a random sampling process on each data block according to distribution conditions of the data, distributing randomly selected sampling data one by one, and forming a plurality of data subsets; performing a local first method on the data subsets; performing fusion of averaging on results of the local first method on the data subsets and outputting an average result. According to the parallelization SVM solving method, Pegasos solving of mass data can be processed without damage to accuracy, operation time is greatly shortened, and expansion can be good.

Keywords:

Parallel computing
Support vector machine
Block (data storage)
File system
Sampling (statistics)
Computer science
Theoretical computer science
operation time
Data mining

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations