Fast semi-supervised discriminant analysis for binary classification of large data-sets

Joris Tavernier,Jaak Simm,Karl Meerbergen,Joerg Kurt Wegner,Hugo Ceulemans,Yves Moreau

Fast semi-supervised discriminant analysis for binary classification of large data-sets

2017

Joris Tavernier
Jaak Simm
Karl Meerbergen
Joerg Kurt Wegner
Hugo Ceulemans
Yves Moreau

High-dimensional data requires scalable algorithms. We propose and analyze three scalable and related algorithms for semi-supervised discriminant analysis (SDA). These methods are based on Krylov subspace methods which exploit the data sparsity and the shift-invariance of Krylov subspaces. In addition, the problem definition was improved by adding centralization to the semi-supervised setting. The proposed methods are evaluated on a industry-scale data set from a pharmaceutical company to predict compound activity on target proteins. The results show that SDA achieves good predictive performance and our methods only require a few seconds, significantly improving computation time on previous state of the art.

Keywords:

Computer science
Data mining
Binary classification
Data set
Linear subspace
Linear discriminant analysis
Scalability
Computation
Exploit
Krylov subspace

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations