fMLC : Fast Multi-Level Clustering and Visualization of Large Molecular Datasets

Duong Vu,Sonja Georgievska,S. Szoke,Arnold Kuzniar,Vincent Robert

fMLC : Fast Multi-Level Clustering and Visualization of Large Molecular Datasets

2018

Duong Vu
Sonja Georgievska
S. Szoke
Arnold Kuzniar
Vincent Robert

Motivation: Despite successful applications of data clustering and visualization techniques in molecular sequence identification, current technologies still do not scale to large biological datasets. Results: We address this problem by a new multi-threaded tool, fMLC, primarily developed to cluster DNA sequences, that is supplemented with an interactive web-based visualization component, DiVE. fMLC enabled to compare, cluster and visualize 350K ITS fungal sequences at the species level. It took less than two hours to compare and cluster the dataset, which is twelve times faster than the time reported previously. Availability: https://github.com/FastMLC/fMLC (doi: 0.5281/zenodo.926820). Contact: d.vu@westerdijkinstitute.nl.

Keywords:

Visualization
Cluster analysis
Data mining
Computer science
Bioinformatics
species level
molecular sequence
Creative visualization

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations