Enabling Efficient Job Dispatching in Accelerator-Extended Heterogeneous Systems with Unified Address Space

2018 
In addition to GPUs that see increasingly widespread use for general-purpose computing, special-purpose accelerators are widely used for their high efficiency and low power consumption, attached to general-purpose CPUs, thus forming Heterogeneous System Architectures (HSAs). This paper presents a new communication model for heterogeneous computing, that utilizes a unified memory space for CPUs and accelerators and removes the requirement for virtual-to-physical address translation through an I/O Memory Management Unit (IOMMU), thus making stronger the adoption of Heterogeneous System Architectures in SoCs that do not include an IOMMU but still representing a large number in real products. By exploiting user-level queuing, workload dispatching to specialized hardware accelerators allows the removal of drawbacks present when copying objects through using the operating system calls. Additionally, dispatching is structured to enable fixed-size packet management that hardware specialized logic accelerates. To also eliminate IOMMU performance loss and IOMMU management complexity, we propose direct accelerator data placement in contiguous space in system-memory, where, the dispatcher provides trasparent access to the accelerators and at the same time offers an easy abstraction in the programming layer for the application. We demonstrate dispatching rates that exceed ten thousand jobs per second implementing architectural support on a low-cost embedded System-on-Chip, bounded only by the computing capacity of the hardware accelerators.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    3
    Citations
    NaN
    KQI
    []