SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient

Chao Wang,Dong Dai,Xi Li,Aili Wang,Xuehai Zhou

SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient

2017

The maximal information coefficient (MIC) has been proposed to discover relationships and associations between pairs of variables. It poses significant challenges for bioinformatics scientists to accelerate the MIC calculation, especially in genome sequencing and biological annotations. In this paper, we explore a parallel approach which uses MapReduce framework to improve the computing efficiency and throughput of the MIC computation. The acceleration system includes biological data storage on HDFS, preprocessing algorithms, distributed memory cache mechanism, and the partition of MapReduce jobs. Based on the acceleration approach, we extend the traditional two-variable algorithm to multiple variables algorithm. The experimental results show that our parallel solution provides a linear speedup comparing with original algorithm without affecting the correctness and sensitivity.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations