Compute cache for data parallel acceleration

2019 
The talk will start with an overview of our work Neural Cache architecture which is capable of fully executing convolutional, fully connected, pooling layers in-cache and also supports quantization in-cache. Then I will present a versatile Compute Cache architecture named Duality Cache, which re-purposes cache structures to transform them into massively parallel compute units capable of running arbitrary data parallel workloads including Deep Neural Networks. Our work presents a holistic approach to building Compute Cache system stack with techniques of performing in-cache floating-point and fixed-point arithmetic, transcendental functions, enabling SIMT execution model, designing a compiler that accepts existing CUDA programs, and providing flexibility in adapting for various workload characteristics. Exposure to massive parallelism that exists in Duality Cache architecture improves performance of GPU benchmarks by 3.6x and OpenACC benchmarks by 3.2x over server class GPU. Re-purposing existing caches provides 72.6x better performance for CPU with only 3.5% of area cost. Duality Cache reduces energy by 5.2x over GPU and 20x over CPU..
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []