Deepframe: A Profile-Driven Compiler for Spatial Hardware Accelerators

2019 
Tracing code paths to form extended basic blocks is useful in many areas, compiler optimizations [1], improving instruction cache behavior [2] and custom-hardware offloading [3]. Prior work has been plagued by small traces, limited either by the overheads of dynamic profiling, statically available information [4], or side-exit branches [5]. In this work, we rethink what code path sequences to fuse and construct long traces for offloading to spatial accelerators, while minimizing the occurrence of side exits which limit dynamic coverage. We introduce a novel technique that recasts learning a program's execution patterns as a natural-language-processing problem, CBOW (Continuous Bag of Words). We then use a deep learning network to learn the relationships among paths. During the compilation phase, the compiler uses a sequence miner to decide what paths are likely to occur. The learning network predicts a Deepframe online, which is an extended basic block comprising a multi-path sequence (each path itself is composed of multiple basic blocks). We demonstrate the efficacy of Deepframe on spatial hardware accelerators and find the following: i) Deepframe can construct up to 5x (max: 27x) longer offload regions compared to prior approaches. ii) Surprisingly far-flung ILP (instruction-level parallelism) and MLP (memory-level parallelism) can be mined from the frames statically (5.5x increase in ILP and 10.5x increase in MLP). iii) The frames offloaded to the spatial accelerator have minimal side exits (mis-speculation) and achieve sufficient dynamic coverage to improve overall application performance (up to 9x improvement). We will be releasing open-source our end-to-end compiler prototype based on LLVM.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    3
    Citations
    NaN
    KQI
    []