Dynamic Load Balancing for Robust Distributed Computing in the Presence of Topological Impairments

Majeed M. Hayat,Jorge E. Pezoa,David Dietz,Sagar Dhakal

Dynamic Load Balancing for Robust Distributed Computing in the Presence of Topological Impairments

2009

The purpose of any distributed computing system (DCS) is to offer a flexible, reliable, and powerful computing platform. With the advances in mobile computing, wireless communications and sensor networks, DCSs have emerged in new applications such as wireless sensor networks (WSNs), military battlefield awareness, surveillance and threat detection, to name a few. These new application areas introduce new challenges to DCSs when operated or deployed in harsh or threat-prone environments. For instance, in WSNs deployed in a military battlefield, the computing elements (CEs) of a DSC join and leave the DCS at any time in a stochastic fashion. More generally, factors such as limited or intermittent communication resources and power constraints or long-term physical damage of the CEs, can result in random topological changes in the DCS, which, in turn, can severely degrade their performance and reliability. Many of these factors can be attributable to physical attacks on our information infrastructure, of which weapons of mass destruction (WMD) is an important example. This observation has triggered government agencies, such as the Defense Threat Reduction Agency, to launch research initiatives in network science to understand the extent of damage that can be inflicted upon networks in the event of attacks and also to develop strategies to increase the robustness of networks when a threat is present. In this article, we review modern dynamic load balancing (DLB) techniques and their mathematical stochastic models that can be exploited by DCS developers to increase the DCS's robustness to random topological changes, and at the same time, to use the available computing resources of the system efficiently, in the presence of communication uncertainty and CE dysfunction. Two scenarios are considered: one where CEs can fail and recover at random instants and another where CEs can fail permanently. Under the first scenario, we look for minimizing the average response time of a given application. In the second scenario, the goal is to maximize the probability of running an entire application successfully. DLB policies are tested using a small-scale DCS environment and compared to theoretical predictions as well as results from Monte Carlo simulations. The mathematical probabilistic model presented here for network performance is general and can be applied to a broad class of networks and applications. Keywords: distributed computing; load balancing; network robustness; WMD attack; reliability; queuing theory; renewal theory

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations