A 19.4-nJ/Decision, 364-K Decisions/s, In-Memory Random Forest Multi-Class Inference Accelerator

2018 
This paper presents an integrated circuit (IC) realization of a random forest (RF) machine learning classifier in a 65-nm CMOS. Algorithm, architecture, and circuits are co-optimized to achieve aggressive energy and delay benefits by taking advantage of the inherent error resiliency derived from the ensemble nature of an RF classifier. Deterministic sub-sampling (DSS) and regularized decision trees reduce interconnect complexity, and avoid irregular memory access patterns and computations, thereby reducing the energy-delay product (EDP). The prototype IC also employs low-swing analog in-memory computations embedded in a standard 6T SRAM to enable massively parallel tree node comparisons, thereby minimizing the memory fetches and reducing the EDP further. The 65-nm CMOS prototype IC achieves a $3.1{\times }$ and $2.2{\times }$ improved energy efficiency and throughput leading to $6.8{\times }$ lower EDP compared to a conventional digital system at the same accuracies of 94% and 97.5% for two tasks: 1) eight-class traffic sign recognition and 2) face detection, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    16
    Citations
    NaN
    KQI
    []