GraphIVE: Heterogeneity-Aware Adaptive Graph Partitioning in GraphLab

2014 
GraphLab, distributed graph-processing framework, has found multiple applications in data-mining. Its scalability makes it the perfect choice for running graph algorithms on large data. The current scheduler in GraphLab splits the graph based on various partitioning strategies. These strategies split the graph into approximately equal parts, which is suited for homogeneous clusters, but is liable to perform poorly in the presence of heterogeneity. A number of challenges arise when the nodes differ in memory and processing power. We show that memory in particular can be a severe bottleneck, even leading to the termination of certain jobs. We determine the extent to which the current scheduler can handle heterogeneity. We further propose GraphIVE (Graph Processing In Varied Environments), a capability-aware graph partitioning policy for GraphLab applications. Moreover, GraphIVE continously tries to reach optimum performance via hill climbing. We describe how GraphIVE reduces the communication overhead by reducing the replication factor of vertices. We implemented a prototype of GraphIVE and present the preliminary results. GraphIVE significantly improves the execution time of jobs. The results also show how it allows for seamless graph processing on a heterogeneous cluster.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    5
    Citations
    NaN
    KQI
    []