Specialized accelerators such as GPUs, TPUs, FPGAs, and custom ASICs have been increasingly deployed to train deep learning models. These accelerators exhibit heterogeneous performance behavior across model architectures. Existing schedulers for clusters of accelerators, which are used to arbitrate these expensive training resources across many users, have shown how to optimize for various multi-job, multi-user objectives, like fairness and makespan. Unfortunately, existing schedulers largely do not consider performance heterogeneity. In this paper, we propose Gavel, a heterogeneity-aware scheduler that systematically generalizes a wide range of existing scheduling policies. Gavel expresses these policies as optimization problems, making it easy to optimize for objectives in a heterogeneity-aware way, while also being cognizant of performance optimizations like space sharing. Gavel then uses a round-based scheduling mechanism to ensure jobs receive their ideal allocation given the target scheduling policy. Gavel’s heterogeneity-aware policies allow a heterogeneous cluster to sustain higher input load, and improve end objectives such as average job completion time and makespan by up to 3.5\times× compared to heterogeneity-agnostic policies.
Resource allocation problems in many computer systems can be formulated as mathematical optimization problems. However, finding exact solutions to these problems using off-the-shelf solvers is often intractable for large problem sizes with tight SLAs, leading system designers to rely on cheap, heuristic algorithms. We observe, however, that many allocation problems are granular: they consist of a large number of clients and resources, each client requests a small fraction of the total number of resources, and clients can interchangeably use different resources. For these problems, we propose an alternative approach that reuses the original optimization problem formulation and leads to better allocations than domain-specific heuristics. Our technique, Partitioned Optimization Problems (POP), randomly splits the problem into smaller problems (with a subset of the clients and resources in the system) and coalesces the resulting sub-allocations into a global allocation for all clients. We provide theoretical and empirical evidence as to why random partitioning works well. In our experiments, POP achieves allocations within 1.5% of the optimal with orders-of-magnitude improvements in runtime compared to existing systems for cluster scheduling, traffic engineering, and load balancing.
Resource allocation problems in many computer systems can be formulated as mathematical optimization problems. However, finding exact solutions to these problems using off-the-shelf solvers is often intractable for large problem sizes with tight SLAs, leading system designers to rely on cheap, heuristic algorithms. We observe, however, that many allocation problems are granular: they consist of a large number of clients and resources, each client requests a small fraction of the total number of resources, and clients can interchangeably use different resources. For these problems, we propose an alternative approach that reuses the original optimization problem formulation and leads to better allocations than domain-specific heuristics. Our technique, Partitioned Optimization Problems (POP), randomly splits the problem into smaller problems (with a subset of the clients and resources in the system) and coalesces the resulting sub-allocations into a global allocation for all clients. We provide theoretical and empirical evidence as to why random partitioning works well. In our experiments, POP achieves allocations within 1.5% of the optimal with orders-of-magnitude improvements in runtime compared to existing systems for cluster scheduling, traffic engineering, and load balancing.
Major cloud providers have stated public plans to lower their carbon emissions. Historically, this has meant focusing on emissions from producing the electricity consumed by datacenters. While work and challenges remain on this avenue, research and industry are actively working on the next step of reducing carbon embedded in servers and racks. At a high level, a promising direction to reduce embodied carbon is to avoid emissions from new manufacturing, which often requires using existing components, devices, and buildings for longer. However, much of the data around carbon breakdowns and reduction opportunities remains silo-ed, leading to speculations and assumptions - both internally and externally - around the opportunities to reduce datacenter carbon intensity. We aim to clarify some of the misconceptions we have encountered.
Due to the inherent randomness of both solar power generation and residential electrical load, jointly sizing solar panel and storage capacity to meet a given quality-of-service (QoS) constraint is challenging. The challenge is greater when there is limited representative historical data. We therefore propose generating synthetic solar and load traces, corresponding to different realizations of the underlying stochastic processes. Specifically, we compare the effectiveness of three generative models: autoregressive moving-average (ARMA) models, Gaussian mixture models (GMMs), and generative adversarial networks (GANs) -- as well as two direct sampling methods -- for synthetic trace generation. These traces are then used for robust joint sizing by a technique described in recent work. Extensive experiments based on real data show that our approach finds robust sizing with only one year's worth of hourly trace data. Moreover, assuming that solar data are available, given a database of load traces, we demonstrate how to perform robust sizing with access to only twelve data points of load, one for each month of one year.
Reinforcement learning (RL) is often considered a promising approach for controlling complex building operations. In this context, RL algorithms are typically evaluated using a testing framework that simulates building operations. To make general claims and avoid overfitting, an RL method should be evaluated on a large and diverse set of buildings. Unfortunately, due to the complexity of creating building simulations, none of the existing frameworks provide more than a handful of simulated buildings. Moreover, each framework has its own particularities, which makes it difficult to evaluate the same algorithm on multiple frameworks. To address this, we present Beobench: a Python toolkit1 that provides unified access to building simulations from multiple frameworks using a container-based approach. We demonstrate the power of our approach with an example showing how Beobench can launch RL experiments in any supported framework with a single command.
Resource allocation problems in many computer systems can be formulated as mathematical optimization problems. However, finding exact solutions to these problems using off-the-shelf solvers in an online setting is often intractable for "hyper-scale" system sizes with tight SLAs, leading system designers to rely on cheap, heuristic algorithms. In this work, we explore an alternative approach that reuses the original optimization problem formulation. By splitting the original problem into smaller, more tractable problems for subsets of the system and then coalescing resulting sub-allocations into a global solution, we achieve empirically quasi-optimal (within 1.5%) performance for multiple domains with several orders-of-magnitude improvement in runtime. Deciding how to split a large problem into smaller sub-problems, and how to coalesce split allocations into a unified allocation, needs to be performed carefully in a domain-aware way. We show common principles for splitting problems effectively across a variety of tasks, including cluster scheduling, traffic engineering, and load balancing.
The growing demands for computational power in cloud computing have led to a significant increase in the deployment of high-performance servers. The growing power consumption of servers and the heat they produce is on track to outpace the capacity of conventional air cooling systems, necessitating more efficient cooling solutions such as liquid immersion cooling. The superior heat exchange capabilities of immersion cooling both eliminates the need for bulky heat sinks, fans, and air flow channels while also unlocking the potential go beyond conventional 2D blade servers to three-dimensional designs. In this work, we present a computational framework to explore designs of servers in three-dimensional space, specifically targeting the maximization of server density within immersion cooling tanks. Our tool is designed to handle a variety of physical and electrical server design constraints. We demonstrate our optimized designs can reduce server volume by 25--52% compared to traditional flat server designs. This increased density reduces land usage as well as the amount of liquid used for immersion, with significant reduction in the carbon emissions embodied in datacenter buildings. We further create physical prototypes to simulate dense server designs and perform real-world experiments in an immersion cooling tank demonstrating they operate at safe temperatures. This approach marks a critical step forward in sustainable and efficient datacenter management.