Information Theory Based Genome-Scale Gene Networks Construction Using MapReduce

2015 
Reverse-engineering genome-scale gene networks from gene expression data is a principal challenge in systems biology. Mutual information (MI) based methods are favored because of their ability to recover non-linear relationships, low algorithmic complexity, and their successful use in various biological applications such as gene function prediction. In this paper, we present the first ever construction of MI based genome-scale gene networks using MapReduce. We develop the solution for all the stages of a MI-based network construction algorithm using only the map and reduce operations on distributed datasets. Our solution is implemented using Spark, a software that provides a compute and memory abstraction for distributed datasets. We deploy our solution using on-demand virtual instances on Amazon EC2 cloud computing platform, thus demonstrating the use of a rent-by-the-hour ad-hoc cluster for this grand challenge problem in systems biology. Our implementation can scale with the number of virtual instances, and can be used to construct networks of sizes in the range of 2000 to 5000 genes within an hour in a cost effective manner. We demonstrate the capability to construct genome-scale networks by reverse engineering a network of over 17,000 genes for the widely studied model plant Arabidopsis Thaliana.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    2
    Citations
    NaN
    KQI
    []