GPU spatial multitasking has been proven to be quite effective at executing different applications concurrently using SM partitioning. However, while it maximizes total throughput, latency-critical applications often cannot meet their deadlines due to the increased execution time. Furthermore, SM partitioning cannot allocate the appropriate L1 cache size per kernel. To solve these problems, this paper proposes a new application-aware resource allocation framework called GPU Fine-Tuner, for assigning appropriate resources to GPU kernels. To minimize the execution time of latency-constrained applications, it assigns them more SMs when performance is not affected. It also increases the cache size of SMs for cache-sensitive kernels using resource borrowing from neighbors for cache-insensitive kernels. Experimental results show that the Fine-Tuner outperforms GPU spatial multitasking with up to 15% less average latency without performance degradation.
DVFS (Dynamic voltage frequency scaling) is one of the most widely used power management technologies employed to improve the performance or minimize the power consumption by controlling voltages and frequencies in real time. When applying device-level DVFS in graphics processing units (GPUs) that support spatial multitasking, it is difficult to determine the optimal DVFS status when multiple running kernels have different characteristics. To solve the problem, we created a GPU simulator that can operate at different streaming multiprocessor frequencies according to the characteristics of the assigned kernel and compared it with a single-clock-based spatial multitasking GPU simulator.