Computational Resource Consumption in Convolutional Neural Network Training – A Focus on Memory

2021 
Deep neural networks (DNNs) have grown in popularity in recent years thanks to the increase  in computing power and the size and relevance of data sets. This has made it possible to build  more complex models and include more areas of research and application. At the same time, the  amount of data generated during the training process of these models puts great pressure on  the capacity and bandwidth of the memory subsystem and, as a direct consequence, has become  one of the biggest bottlenecks for the scalability of neural networks. Therefore, the optimizing of  the workloads produced by DNNs in the memory subsystem requires a detailed understanding of  access to the memory and the interactions between the processor, accelerator devices, and the  system memory hierarchy. However, contrary to what would be expected, most DNN profilers  work at a high level, so they only perform an analysis of the model and individual layers of the  network leaving aside the complex interactions between all the hardware components involved  in the training. This article shows the characterization performed using a convolutional neural  network implemented in the two most popular frameworks: TensorFlow and Pytorch. Likewise,  the behavior of the component interactions is discussed by varying the batch size for two sets of  synthetic data and showing the results obtained by the profiler created for the study. Moreover,  the results obtained when evaluating the AlexNet version on TensorFlow and its similarity in  behavior when using a basic CNN are included.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []