Runtime Support for Adaptive Spatial Partitioning and Inter-Kernel Communication on GPUs
2014
GPUs have gained tremendous popularity in a broad range of application domains. These applications possess varying grains of parallelism and place high demands on compute resources -- many times imposing real-time constraints, requiring flexible work schedules, and relying on concurrent execution of multiple kernels on the device. These requirements present a number of challenges when targeting current GPUs. To support this class of applications, and to take full advantage of the large number of compute cores present on the GPU, we need a new mechanism to support concurrent execution and provide flexible mapping of compute kernels to the GPU. In this paper, we describe a new scheduling mechanism for dynamic spatial partitioning of the GPU, which adapts to the current execution state of compute workloads on the device. To enable this functionality, we extend the OpenCL runtime environment to map multiple command queues to a single device, and effectively partitioning the device. The result is that kernels that can benefit from concurrent execution on a partitioned device can effectively utilize the full compute resources on the GPU. To accelerate next-generation workloads, we also support an inter-kernel communication mechanism that enables concurrent kernels to interact in a producer-consumer relationship. The proposed partitioning mechanism is evaluated using real world applications taken from signal and image processing, linear algebra, and data mining domains. For these performance-hungry applications we achieve a 3.1X performance speedup using a combination of the proposed scheduling scheme and inter-kernel communication, versus relying on the conventional GPU runtime.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
19
References
14
Citations
NaN
KQI