Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization

2019 
The modern system-on-chip (SoC) of the current exascale computing era is complex. These SoCs not only consist of several general-purpose processing cores but also integrate many specialized hardware accelerators. Three common coherency interfaces are used to integrate the accelerators with the memory hierarchy: non-coherent,coherent with the last-level cache (LLC), and fully-coherent.However, using a single coherence interface for all the accelerators in an SoC can lead to significant overheads: in the non-coherent model, accelerators directly access the main memory, which can have considerable performance penalty; whereas in the LLC-coherent model, the accelerators access the LLC but may suffer from performance bottleneck due to contention between several accelerators; and the fully-coherent model, that relies on private caches, can incur non-trivial power/area overheads. Given the limitations of each of these interfaces, this paper proposes a novel performance-aware hybrid coherency interface, where different accelerators use different coherency models, decided at design time based on the target applications so as to optimize the overall system performance. A new Bayesian optimization based framework is also proposed to determine the optimal hybrid coherency interface, i.e., use machine learning to select the best coherency model for each of the accelerators in the SoC in terms of performance. For image processing and classification workloads, the proposed framework determined that a hybrid interface achieves up to 23 percent better performance compared to the other ’homogeneous‘ interfaces, where all the accelerators use a single coherency model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    8
    Citations
    NaN
    KQI
    []