Exploiting parallelism of imperfect nested loops with sibling inner loops on coarse-grained reconfigurable architectures

2016 
Coarse-grained reconfigurable architecture (CGRA) is a promising platform for loop acceleration, but existing software pipelining methods cannot achieve satisfactory performance on a fair number of imperfect nested loops, especially those with sibling inner loops. To tackle this problem, this paper makes 2 contributions: 1) a 2-level pipelining method with an effective II optimization strategy for the imperfect loops with sibling inner loops; 2) a novel kernel compression method to reduce oversize kernel. Experiment results show that our approach can achieve much higher performance than the state-of-the-art approaches at acceptable costs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    1
    Citations
    NaN
    KQI
    []