An Automatic-Addressing Architecture with Fully Serialized Access in Racetrack Memory for Energy-Efficient CNNs

2020 
Racetrack memory, an emerging low-power magnetic memory, promises a competitive replacement for traditional memory in the accelerators. However, random access in racetrack memory is time and energy expenditure for CNN accelerators because of its large amount of invalid-shifts. In this work, we propose an automatic-addressing architecture that builds a novel data layout to guarantee that the next round of memory access can be always satisfied at the in-situ or rigorously adjacent cells of current round, producing a fully serialized access footprint that can drive instant port-alignment without any invalid-shifts in racetrack memory. By this way, original address-based access degrades to the selections repeated among the three candidates, i.e., one in-situ cell and two neighbor cells. Based on this simplification, a lightweight access management can generate the sequence of one-out-three selections according to the deterministic access behaviors defined by CNN hyper-parameters. The evaluation shows that, when deploying the five popular CNN applications to our architecture, the physical shifts of racetrack is curtailed by 74.64% over legacy layout, which achieves 54.2% and 42.1% energy reduction on read and write respectively. A case study of YOLOv2 indicates that our architecture performs 6.503 GOp/J that achieves 18.5× improvement to server-level GPUs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []