An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers
2016
High-level synthesis (HLS) tools have made significant progress in compiling high-level descriptions of computation into highly pipelined register-transfer level specifications. The high-throughput computation raises a high data demand. To prevent data accesses from being the bottleneck, on-chip memories are used as data reuse buffers to reduce off-chip accesses. Also memory partitioning is explored to increase the memory bandwidth by scheduling multiple simultaneous memory accesses to different memory banks. Prior work on memory partitioning of data reuse buffers is limited to uniform partitioning. In this paper, we perform an early-stage exploration of nonuniform memory partitioning. We use the stencil computation, a popular communication-intensive application domain, as a case study to show the potential benefits of nonuniform memory partitioning. Our novel method can always achieve the minimum memory size and the minimum number of memory banks, which cannot be guaranteed in any prior work. We develop a generalized microarchitecture to decouple stencil accesses from computation, and an automated design flow to integrate our microarchitecture with the HLS-generated computation kernel for a complete accelerator.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
18
References
16
Citations
NaN
KQI