Supporting Very Large Models using Automatic Dataflow Graph Partitioning

Minjie Wang,Chien-chin Huang,Jinyang Li

Supporting Very Large Models using Automatic Dataflow Graph Partitioning

2019

Minjie Wang
Chien-chin Huang
Jinyang Li

This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow graph of fine-grained tensor operators used by platforms like MXNet and TensorFlow. In order to automatically partition each operator, we propose to describe the semantics of an operator in a simple language inspired by Halide. To optimally partition different operators in a dataflow graph, Tofu uses a recursive search algorithm that minimizes the total communication cost. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves 25% - 400% speedup over alternative approaches to train very large models.

Keywords:

Parallel computing
Recursion
Memory footprint
Speedup
Dataflow
Computer science
Search algorithm
Distributed computing
Graph partition
Operator (computer programming)
Partition (number theory)

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations