Scalable and parallel machine learning algorithms for statistical data mining - Practice & experience

Morris Riedel,M. Goetz,Matthias Richerzhagen,Philipp Glock,C. Bodenstein,Ahmed Shiraz Memon,Mohammad Shahbaz Memon

Scalable and parallel machine learning algorithms for statistical data mining - Practice & experience

2015

Many scientific datasets (e.g. earth sciences, medical sciences, etc.) increase with respect to their volume or in terms of their dimensions due to the ever increasing quality of measurement devices. This contribution will specifically focus on how these datasets can take advantage of new ‘big data’ technologies and frameworks that often are based on parallelization methods. Lessons learned with medical and earth science data applications that require parallel clustering and classification techniques such as support vector machines (SVMs) and density-based spatial clustering of applications with noise (DBSCAN) are a substantial part of the contribution. In addition, selected experiences of related ‘big data’ approaches and concrete mining techniques (e.g. dimensionality reduction, feature selection, and extraction methods) will be addressed too. In order to overcome identified challenges, we outline an architecture framework design that we implement with open available tools in order to enable scalable and parallel machine learning applications in distributed systems.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations