On space constrained computations

Lorenzo de Stefani

On space constrained computations

2016

Lorenzo de Stefani

Since the advent of the digital computer, its supporting technology has been characterized by steady and impressive growth. Although most parameters are still being improved, there is an emerging consensus that physical limitations to signal propagation speed and device size are becoming increasingly significant. Ultimately, the access time to the memory associated to a digital computer is bound to increase with the size of the memory. Therefore, when a large overall memory is required, it becomes convenient to organize it hierarchically into a sequence of levels whose size and access time increase progressively. Typically, the levels of memory hierarchy include the CPU registers, two or three cache levels, main memory and disks. Compared to the CPU registers, main memory is a few hundred times slower and disks are a few million times slower, hence, effective use of the fastest levels of the memory hierarchy is becoming a key concern in the design and implementation of algorithms. Physical limitations are a concern also in the context of multiprocessor architectures and parallel computing, due to the delay introduced by the speed of the signals used for the communication between the various processing units. Furthermore, while Moore's Law predicts an exponential speedup of hardware in general, the annual improvement rate of time per-arithmetic-operation has, over the years, consistently exceeded that of time-per-word read/write. The fraction of running time spent on communication is thus expected to increase further, becoming more and more of a bottleneck for the performance of both multi-level memory and parallel computing architectures. When considering the complexity of algorithms, two kinds of costs are therefore to be considered: the arithmetic cost which depends on the number of required computational steps, and the communication cost which depends on the required movement of data within the execution of an algorithm, either between levels of a memory hierarchy (in the sequential case), or over a network connecting processors (in the parallel case). In both of these applicative scenarios, the communication component of an algorithm often costs significantly more time than its arithmetic component. It is therefore of interest to investigate the minimum memory space required for computation of algorithms on the one hand (the space complexity), and then the tradeoff between the memory space actually being used and the data communication needed for the algorithm execution (the I/O complexity). In addition to a purely theoretic interest of such an analysis, the pursuit of good lower bounds techniques is also crucial for the pursuit of high performances algorithms, since they enable to evaluate the distance from optimality of a proposed solution. In our study we focus on computations done with straight-line programs (opposed to branching programs) in a data-independent fashion, where the succession of the operations to be executed is thus not influenced by the specific value of input values (opposed to data-dependent computations), which can be modeled as Computational Directed Acyclic Graph (CDAG) G( V,E), whose set of vertices represents operations (of both input/output and processing type) and whose set of edges represents data dependencies. In this thesis we investigate various aspects of CDAG computations, among which their memory requirements and the amount of data movement (either between levels of a memory hierarchy or between various processors executing a program in parallel) required in situations in which only a limited amount of memory space is available. This thesis is organized as follows. In Chapter 1, we introduce the notation and the main definitions which will be used through the presentation. In Chapter 2, we study lower bounds on the size of the memory space which is necessary to compute various CDAGs. We introduce the Pebble Game, a theoretical device used in literature to study the space requirements of CDAGs computations. We then describe the Marking Rule approach originally introduced by Bilardi et. al. in [10] and we apply it in order to obtain lower bound on the space complexity of Superconcentrators-Stack CDAGs [68,40]. In order to study the limits of the Marking Rule approach, we introduce the concept of visit of a CDAG, and we prove various upper bounds on the memory space required for visiting a CDAG under appropriate conditions. In Chapter 3, we study upper bounds on the minimum memory space necessary to compute any CDAG. After reviewing the main contributions in literature [35, 40, 43], we present a novel algorithm which allows to pebble any CDAG with |E|=m edges using at most O(m\log m) memory space. In Chapter 4, we direct our attention towards the input-output cost (I/O cost) of CDAGs computations, intended either as the data exchange between different levels of a memory hierarchy, or as the communication of data between various processors executing a program in parallel. In particular, we obtain lower bounds for the input-output (I/O) complexity of CDAGs computations with respect to the classical model by Hong and Kung [37]. We begin by studying the I/O complexity of Strassen's algorithm when executed sequentially on a machine equipped with a two level memory hierarchy. We provide an alternative technique to those in [7, 62] to obtain a tight lower bound to the I/O complexity of Strassen's matrix multiplication algorithm for computations in which no intermediate result is ever recomputed. We then obtain the first asymptotically tight lower bound to the I/O complexity of Strassen's algorithm for general computations, that is, computations without any restriction on the recomputation of intermediate values. We also study the I/O complexity of Strassen's algorithm when executed in parallel by P processors each equipped with a finite memory. We obtain an lower bound which holds for any computation (no restriction on recomputation), without any assumption regarding the distribution of the input data among the P processors at the beginning of the computation. Furthermore, in the main contribution of Chapter 4, we provide a novel lower bound for the I/O complexity of Strassen's matrix multiplication algorithm, which holds for all possible computations, without constraints on the number of times an immediate result can be computed. In Chapter 5, we consider the effect of opportune memory utilization in the context of error resilient algorithms, which provide (almost) correct solutions even when silent memory errors occur. In particular, we provide a brief overview of the results published by the author in [22].

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations