An OpenMP algorithm and implementation for clustering biological graphs

2011 
Graph algorithms on parallel architectures present an interesting case study for irregular applications. Among the graph algorithms popular in scientific computing, graph clus tering or community detection has numerous applications in computational biology. However, this operation also poses serious computational challenges because of irregular memory access patterns, large memory requirements, and their dependence on other auxiliary (also irregular) data structures to supplement processing. In this paper, we address the problem of graph clustering on shared memory machines. We present a new OpenMP-based parallel algorithm called pClust-sm , which uses adjacency lists, hash tables and union-find data structures in parallel. The algorithm improves both the asymptotic runtime and memory complexities of a previous serial implementation. Preliminary results show that this algorithm can scale up to 8 threads (cores) of a shared memory machine on a real world metagenomics input graph with 1.2M vertices and 100M edges. More importantly, the new implementation drastically reduces the time to solution from the order of several hours to just over 4 minutes, and in addition, it enhances the problem size reach by at least one order of magnitude.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    10
    Citations
    NaN
    KQI
    []