Finding Clusters and Patterns in Big Data Applications: State-of-the-Art Methods in Clustering Environments

Kehan Zhang,Zhenglin Wang,Lei Liu

Finding Clusters and Patterns in Big Data Applications: State-of-the-Art Methods in Clustering Environments

2021

With the rapid development of computation power and machine learning algorithms, clustering methods have become a powerful tool to providing insights and detecting structures in datasets. Clustering methods are especially important for big data applications where the dimensionality is high and the amount of data is too large to be examined by human. This paper provides a comprehensive review of the start-of-the-art clustering methods and their recent progresses, and analyzes their performances when applied to a high-dimensional expression level dataset across multiple environments. As PCA and MDS are very commonly used among researchers for clustering environments based on expression level, we show that UMAP and t-SNE greatly outperforms these traditional methods. The conclusion of the paper will be able to assist researchers to understand how the clustering methods works and pick up the best method for related proposes; and hopefully, by combining the ideas and advantages of multiple clustering methods, novel clustering methods that are more powerful and universal can be developed.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations