Large-scale logistic regression and linear support vector machines using spark

Chieh-Yen Lin,Cheng-Hao Tsai,Ching-pei Lee,Chih-Jen Lin

Large-scale logistic regression and linear support vector machines using spark

2014

Chieh-Yen Lin
Cheng-Hao Tsai
Ching-pei Lee
Chih-Jen Lin

Logistic regression and linear SVM are useful methods for large-scale classification. However, their distributed implementations have not been well studied. Recently, because of the inefficiency of the MapReduce framework on iterative algorithms, Spark, an in-memory cluster-computing platform, has been proposed. It has emerged as a popular framework for large-scale data processing and analytics. In this work, we consider a distributed Newton method for solving logistic regression as well linear SVM and implement it on Spark. We carefully examine many implementation issues significantly affecting the running time and propose our solutions. After conducting thorough empirical investigations, we release an efficient and easy-to-use tool for the Spark community.

Keywords:

Data mining
Machine learning
Artificial intelligence
Support vector machine
Analytics
Logistic regression
Computer science
Newton's method
Spark (mathematics)
Inefficiency
Data processing
Implementation

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations