A High-Throughput Neural Network Accelerator

2015 
Machine-learning tasks are becoming pervasive in a broad range of domains and systems (from embedded systems to datacenters). Recent advances on machine-learning show that neural networks are the state of the art across many applications. As architectures evolve toward heterogeneous multicores comprising a mix of cores and accelerators, a neural network accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art neural networks are characterized by their large size. The authors designed an accelerator architecture for large-scale neural networks, with a special emphasis on the impact of memory on accelerator design, performance, and energy. In this article, they present a concrete design at 65 nm that can perform 496 16-bit fixed-point operations in parallel every 1.02 ns, that is, 452 GOP/s, in a 3.02mm 2 , 485-mW footprint (excluding main memory accesses).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    32
    Citations
    NaN
    KQI
    []