A 19.4-nJ/Decision, 364-K Decisions/s, In-Memory Random Forest Multi-Class Inference Accelerator
2018
This paper presents an integrated circuit (IC) realization of a random forest (RF) machine learning classifier in a 65-nm CMOS. Algorithm, architecture, and circuits are co-optimized to achieve aggressive energy and delay benefits by taking advantage of the inherent error resiliency derived from the ensemble nature of an RF classifier. Deterministic sub-sampling (DSS) and regularized decision trees reduce interconnect complexity, and avoid irregular memory access patterns and computations, thereby reducing the energy-delay product (EDP). The prototype IC also employs low-swing analog in-memory computations embedded in a standard 6T SRAM to enable massively parallel tree node comparisons, thereby minimizing the memory fetches and reducing the EDP further. The 65-nm CMOS prototype IC achieves a $3.1{\times }$ and $2.2{\times }$ improved energy efficiency and throughput leading to $6.8{\times }$ lower EDP compared to a conventional digital system at the same accuracies of 94% and 97.5% for two tasks: 1) eight-class traffic sign recognition and 2) face detection, respectively.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
18
References
16
Citations
NaN
KQI