Analyzing the distributed training of deep-learning models via data locality

Saúl Alonso-Monsalve,Alejandro Calderón,Félix García-Carballeira,Jose Rivadeneira

Analyzing the distributed training of deep-learning models via data locality

2021

In the last few years, deep-learning models are becoming crucial for numerous scientific and industrial applications. Due to the growth and complexity of deep neural networks, researchers have been investigating techniques to train those networks more efficiently. Many efforts have been made to optimize deep-learning models by parallelizing or distributing their training computation across multiple devices. Current state-of-the-art techniques, such as Horovod, have shown to maximize the performance of both the training computation and the inter-node communication of models for different deep-learning frameworks. However, some applications cannot take advantage of the above techniques due to an I/O bottleneck caused by the input data, thus limiting the scalability of the trainings. In this paper, we study an approach based on data locality - that has not been fully studied yet - for those neural networks that cannot benefit from scaling their computation due to a significant bottleneck in the data I/O.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations