A 35.6TOPS/W/mm2 3-Stage Pipelined Computational SRAM with Adjustable Form Factor for Highly Data-Centric Applications

2020 
In the context of highly data-centric applications, close reconciliation of computation and storage should significantly reduce the energy-consuming process of data movement. This letter proposes a computational SRAM (C-SRAM) combining in- and near-memory computing (IMC/NMC) approaches to be used by a scalar processor as an energy-efficient vector processing unit. Parallel computing is thus performed on vectorized integer data on large words using usual logic and arithmetic operators. Furthermore, multiple rows can be advantageously activated simultaneously to increase this parallelism. The proposed C-SRAM is designed with a two-port pushed-rule foundry bitcell, available in most existing design platforms, and an adjustable form factor to facilitate physical implementation in a SoC. The 4-kB C-SRAM testchip of 128-b words manufactured in 22-nm FD-SOI process technology displays a subarray efficiency of 72% as well as an additional computing area of less than 5%. The measurements averaged on 10 dies at 0.85 V and 1 GHz demonstrate an energy efficiency per unit area of 35.6 and 1.48 TOPS/W/mm2 for 8-b additions and multiplications with 3- and 24-ns computing latency, respectively. Compared to a 128-b SIMD processor architecture, up to $2\times $ energy reduction and $1.8\times $ speed-up gains are achievable for a representative set of highly data-centric application kernels.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    2
    Citations
    NaN
    KQI
    []