A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference

Bruce M. Fleischer,Sunil Shukla,Matthew M. Ziegler,Joel Abraham Silberman,Jinwook Oh,Vijayalakshmi Srinivasan,Jungwook Choi,Silvia Melitta Mueller,Ankur Agrawal,Tina Babinsky,Nianzheng Cao,Chia-Yu Chen,Pierce Chuang,Thomas W. Fox,George Gristede,Michael A. Guillorn,Howard M. Haynie,Michael Klaiber,Dongsoo Lee,Shih-Hsien Lo,Gary W. Maier,Michael Scheuermann,Swagath Venkataramani,Christos Vezyrtzis,Naigang Wang,Fanchieh Yee,Ching Zhou,Pong-Fei Lu,Brian W. Curran,Leland Chang,Kailash Gopalakrishnan

A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference

2018

A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a dataflow architecture and an on-chip scratchpad hierarchy. Compute precision is optimized at 16b floating point (fp 16) for high model accuracy in training and inference as well as 1b/2b (bi-nary/ternary) integer for aggressive inference performance. At 1.5 GHz, the AI core prototype achieves 1.5 TFLOPS fp 16, 12 TOPS ternary, or 24 TOPS binary peak performance in 14nm CMOS.

Keywords:

Parallel computing
Computer science
Deep learning
Floating point
Artificial neural network
Edge device
Dataflow architecture
Multi-core processor
Inference
Scalability
Artificial intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations