Scaling distributed deep learning workloads beyond the memory capacity with KARMA
2020
The dedicated memory of hardware accelerators can be insufficient to store all weights and/or intermediate states of large deep learning models. Although model parallelism is a viable approach to reduce the memory pressure issue, significant modification of the source code and considerations for algorithms are required. An alternative solution is to use out-of-core methods instead of, or in addition to, data parallelism.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
35
References
0
Citations
NaN
KQI