Any physical channel of communication offers two potential reasons why its capacity (the number of bits it can transmit in a unit of time) might be unbounded: (1) (Uncountably) infinitely many choices of signal strength at any given instant of time, and (2) (Uncountably) infinitely many instances of time at which signals may be sent. However channel noise cancels out the potential unboundedness of the first aspect, leaving typical channels with only a finite capacity per instant of time. The latter source of infinity seems less extensively studied. A potential source of unreliability that might restrict the capacity also from the second aspect is ``delay'': Signals transmitted by the sender at a given point of time may not be received with a predictable delay at the receiving end. In this work we examine this source of uncertainty by considering a simple discrete model of delay errors. In our model the communicating parties get to subdivide time as microscopically finely as they wish, but still have to cope with communication delays that are macroscopic and variable. The continuous process becomes the limit of our process as the time subdivision becomes infinitesimal. We taxonomize this class of communication channels based on whether the delays and noise are stochastic or adversarial, and based on how much information each aspect has about the other when introducing its errors. We analyze the limits of such channels and reach somewhat surprising conclusions: The capacity of a physical channel is finitely bounded only if at least one of the two sources of error (signal noise or delay noise) is adversarial. In particular the capacity is finitely bounded only if the delay is adversarial, or the noise is adversarial and acts with knowledge of the stochastic delay. If both error sources are stochastic, or if the noise is adversarial and independent of the stochastic delay, then the capacity of the associated physical channel is infinite!
We consider the problem of finding a set of k vertices of maximum total influence in a given undirected network, under the independent cascade (IC) model of influence spread. It is known that influence is monotone and submodular in the IC model, and hence a greedy algorithm achieves a (1–1/e) approximation to this problem. Moreover, it is known to be NP-hard to achieve a better approximation factor in directed networks.We show that for undirected networks, this approximation barrier can be overcome: the greedy algorithm obtains an (1 − 1/e + c) approximation to the set of optimal influence, for some constant c > 0. Our proof proceeds via probabilistic analysis of bond percolation in arbitrary finite networks. We also show that the influence maximization problems remains APX-hard in undirected networks.
Archiving is important for scientific data, where it is necessary to record all past versions of a database in order to verify findings based upon a specific version. Much scientific data is held in a hierachical format and has a key structure that provides a canonical identification for each element of the hierarchy. In this article, we exploit these properties to develop an archiving technique that is both efficient in its use of space and preserves the continuity of elements through versions of the database, something that is not provided by traditional minimum-edit-distance diff approaches. The approach also uses timestamps. All versions of the data are merged into one hierarchy where an element appearing in multiple versions is stored only once along with a timestamp. By identifying the semantic continuity of elements and merging them into one data structure, our technique is capable of providing meaningful change descriptions, the archive allows us to easily answer certain temporal queries such as retrieval of any specific version from the archive and finding the history of an element. This is in contrast with approaches that store a sequence of deltas where such operations may require undoing a large number of changes or significant reasoning with the deltas. A suite of experiments also demonstrates that our archive does not incur any significant space overhead when contrasted with diff approaches. Another useful property of our approach is that we use XML format to represent hierarchical data and the resulting archive is also in XML. Hence, XML tools can be directly applied on our archive. In particular, we apply an XML compressor on our archive, and our experiments show that our compressed archive outperforms compressed diff-based repositories in space efficiency. We also show how we can extend our archiving tool to an external memory archiver for higher scalability and describe various index structures that can further improve the efficiency of some temporal queries on our archive.
Scientific workflow management systems are increasingly providing the ability to manage and query the provenance of data products. However, the problem of differencing the provenance of two data products produced by executions of the same specification has not been adequately addressed. Although this problem is NP-hard for general workflow specifications, an analysis of real scientific (and business) workflows shows that their specifications can be captured as series-parallel graphs overlaid with well-nested forking and looping. For this natural restriction, we present efficient, polynomial-time algorithms for differencing executions of the same specification and thereby understanding the difference in the provenance of their data products. We then describe a prototype called PDiffView built around our differencing algorithm. Experimental results demonstrate the scalability of our approach using collected, real workflows and increasingly complex runs.
In the ASYMMETRIC k -CENTER problem, the input is an integer k and a complete digraph over n points together with a distance function obeying the directed triangle inequality. The goal is to choose a set of k points to serve as centers and to assign all the points to the centers, so that the maximum distance of any point from its center is as small as possible.We show that the ASYMMETRIC k -CENTER problem is hard to approximate up to a factor of log * n − O (1) unless NP ⊆ DTIME ( n log log n ). Since an O (log * n )-approximation algorithm is known for this problem, this resolves the asymptotic approximability of ASYMMETRIC k -CENTER. This is the first natural problem whose approximability threshold does not polynomially relate to the known approximation classes. We also resolve the approximability threshold of the metric (symmetric) k -Center problem with costs.
Maximum bipartite matching (MBM) is a fundamental problem in combinatorial optimization with a long and rich history. A classic result of Hopcroft and Karp (1973) provides an O(m √n)-time algorithm for the problem, where n and m are the number of vertices and edges in the input graph, respectively. For dense graphs, an approach based on fast matrix multiplication achieves a running time of O(n2.371). For several decades, these results represented state-of-the-art algorithms, until, in 2013, Madry introduced a powerful new approach for solving MBM using continuous optimization techniques. This line of research, that builds on continuous techniques based on interior-point methods, led to several spectacular results, culminating in a breakthrough m1+o(1)-time algorithm for min-cost flow, that implies an m1+o(1)-time algorithm for MBM as well. These striking advances naturally raise the question of whether combinatorial algorithms can match the performance of the algorithms that are based on continuous techniques for MBM. One reason to explore combinatorial algorithms is that they are often more transparent than their continuous counterparts, and that the tools and techniques developed for such algorithms may be useful in other settings, including, for example, developing faster algorithms for maximum matching in general graphs. A recent work of Chuzhoy and Khanna (2024) made progress on this question by giving a combinatorial Õ(m1/3n5/3)-time algorithm for MBM, thus outperforming both the Hopcroft-Karp algorithm and matrix multiplication based approaches, on sufficiently dense graphs. Still, a large gap remains between the running time of their algorithm and the almost linear-time achievable by algorithms based on continuous techniques. In this work, we take another step towards narrowing this gap, and present a randomized n2+o(1)-time combinatorial algorithm for MBM. Thus in dense graphs, our algorithm essentially matches the performance of algorithms that are based on continuous methods. Similar to the classical algorithms for MBM and the approach used in the work of Chuzhoy and Khanna (2024), our algorithm is based on iterative augmentation of a current matching using augmenting paths in the corresponding (directed) residual flow network. Our main contribution is a recursive algorithm that exploits the special structure of the resulting flow problem to recover an Ω(1/log2 n)-fraction of the remaining augmentations in n2+o(1) time. Finally, we obtain a randomized n2+o(1)-time algorithm for maximum vertex-capacitated s-t flow in directed graphs when all vertex capacities are identical, using a standard reduction from this problem to MBM.
The approximability of the maximum edge-disjoint paths problem (EDP) in directed graphs was seemingly settled by an Ω( m 1/2 - ϵ)-hardness result of Guruswami et al. [2003], and an O (√ m ) approximation achievable via a natural multicommodity-flow-based LP relaxation as well as a greedy algorithm. Here m is the number of edges in the graph. We observe that the Ω( m 1/2 - ϵ)-hardness of approximation applies to sparse graphs, and hence when expressed as a function of n , that is, the number of vertices, only an Ω( n 1/2 - ϵ)-hardness follows. On the other hand, O (√ m )-approximation algorithms do not guarantee a sublinear (in terms of n ) approximation algorithm for dense graphs. We note that a similar gap exists in the known results on the integrality gap of the flow-based LP relaxation: an Ω(√ n ) lower bound and O (√ m ) upper bound. Motivated by this discrepancy in the upper and lower bounds, we study algorithms for EDP in directed and undirected graphs and obtain improved approximation ratios. We show that the greedy algorithm has an approximation ratio of O (min( n 2/3 , √ m )) in undirected graphs and a ratio of O (min( n 4/5 , √ m )) in directed graphs. For acyclic graphs we give an O (√ n ln n ) approximation via LP rounding. These are the first sublinear approximation ratios for EDP. The results also extend to EDP with weights and to the uniform-capacity unsplittable flow problem (UCUFP).
A seminal work of [Ahn-Guha-McGregor, PODS'12] showed that one can compute a cut sparsifier of an unweighted undirected graph by taking a near-linear number of linear measurements on the graph. Subsequent works also studied computing other graph sparsifiers using linear sketching, and obtained near-linear upper bounds for spectral sparsifiers [Kapralov-Lee-Musco-Musco-Sidford, FOCS'14] and first non-trivial upper bounds for spanners [Filtser-Kapralov-Nouri, SODA'21]. All these linear sketching algorithms, however, only work on unweighted graphs, and are extended to weighted graphs by weight grouping, a non-linear operation not implementable in, for instance, general turnstile streams.In this paper, we initiate the study of weighted graph sparsification by linear sketching by investigating a natural class of linear sketches that we call incidence sketches, in which each measurement is a linear combination of the weights of edges incident on a single vertex. This class captures all aforementioned linear sketches for unweighted sparsification. It also covers linear sketches implementable in the simultaneous communication model, where edges are distributed across n machines. Our results are:1)Weighted cut sparsification: We give an algorithm that computes a $(1+\epsilon)$-cut sparsifier using $\tilde{O}(n\epsilon^{-3})$ linear measurements, which is nearly optimal. This also implies a turnstile streaming algorithm with $\tilde{O}(n\epsilon^{-3})$ space. Our algorithm is achieved by building a so-called "weighted edge sampler" for each vertex.2)Weighted spectral sparsification: We give an algorithm that computes a $(1+\epsilon)$-spectral sparsifier using $\tilde{O}(n^{6/5}\epsilon^{-4})$ linear measurements. This also implies a turnstile streaming algorithm with $\tilde{O}(n^{6/5}\epsilon^{-4})$ space. Key to our algorithm is a novel analysis of how the effective resistances change under vertex sampling. Complementing our algorithm, we then prove a superlinear lower bound of $\Omega(n^{21/20-o(1)})$ measurements for computing some O(1)-spectral sparsifier using incidence sketches.3)Weighted spanner computation: We first show that any $o(n^{2})$ linear measurements can only recover a spanner of stretch that in general depends linearly on $\frac{w_{\max}}{w_{\min}}$. We thus focus on graphs with $\frac{w_{\max}}{w_{\min}}=O(1)$ and study the stretch's dependence on n. On such graphs, the algorithm in [FiltserKapralov-Nouri, SODA'21] can obtain a spanner of stretch $\tilde{O}\left(n^{\frac{2}{3}\left(1-\alpha\right)}\right)$ using $\tilde{O}(n^{1+\alpha})$ measurements for any $\alpha\in [0,1]$. We prove that, for incidence sketches, this tradeoff is optimal up to an $n^{o(1)}$ factor for all $\alpha\lt 1/10$.We prove both our lower bounds by analyzing the "effective resistances" in certain matrix-weighted graphs, where we develop a number of new tools for reasoning about such graphs – most notably (i) a matrix-weighted analog of the widely used expander decomposition of ordinary graphs, and (ii) a proof that a random vertex-induced subgraph of a matrix-weighted expander is also an expander. We believe these tools are of independent interest.