language-icon Old Web
English
Sign In

NPU Thermal Management

2020 
Neural Processing Units (NPUs) are becoming an integral part in all modern computing systems due to their substantial role in accelerating Neural Networks (NNs). The significant improvements in cost-energy-performance stem from the massive array of multiply-accumulate (MAC) units that remarkably boosts the throughput of NN inference. In this work, we are that first to investigate the thermal challenges that NPUs bring, revealing how MAC arrays, which form the heart of any NPU, impose serious thermal bottlenecks to on-chip systems due to their excessive power densities. For the first time, we explore 1) the effectiveness of precision scaling and frequency scaling in temperature reductions and 2) how advanced on-chip cooling using superlattice thin-film Thermoelectric (TE) open doors for new trade-offs between temperature, throughput, cooling cost and inference accuracy in NPU chips. Our work unveils that hybrid thermal management, which composes different means to reduce the NPU temperature, is a key. To achieve that, we propose and implement PFS-TE technique that couples Precision and Frequency Scaling together with superlattice TE cooling for effective NPU thermal management. Using commercial signoff tools, we obtain accurate power and timing analysis of MAC arrays after a full chip design is performed based on 14nm Intel FinFET technology. Then, multi-physics simulations using finite element methods are carried out for accurate heat simulations in the presence and absence of on-chip cooling. Afterwards, comprehensive design-space exploration is presented to demonstrate the Pareto frontier and the existing trade-offs between temperature reductions, power overheads due to cooling, throughput and inference accuracy. Using a wide range of NNs trained for image classification, experimental results demonstrate that our novel NPU thermal management increases the inference efficiency (TOPS/Joule) by 1.33x, 1.87x, and 2x under different temperature constraints; 105∘C, 85∘C and 70∘C, respectively, while the average accuracy drops merely from 89.0% to 85.5%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    10
    Citations
    NaN
    KQI
    []