Exploiting parallelism of imperfect nested loops with sibling inner loops on coarse-grained reconfigurable architectures
2016
Coarse-grained reconfigurable architecture (CGRA) is a promising platform for loop acceleration, but existing software pipelining methods cannot achieve satisfactory performance on a fair number of imperfect nested loops, especially those with sibling inner loops. To tackle this problem, this paper makes 2 contributions: 1) a 2-level pipelining method with an effective II optimization strategy for the imperfect loops with sibling inner loops; 2) a novel kernel compression method to reduce oversize kernel. Experiment results show that our approach can achieve much higher performance than the state-of-the-art approaches at acceptable costs.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
15
References
1
Citations
NaN
KQI