Given a set of candidate Datalog rules, the Datalog synthesis-as-rule-selection problem chooses a subset of these rules that satisfies a specification (such as an input-output example). Building off prior work using counterexample-guided inductive synthesis, we present a progression of three solver-based approaches for solving Datalog synthesis-as-rule-selection problems. Two of our approaches offer some advantages over existing approaches, and can be used more generally to solve arbitrary SMT formulas containing Datalog predicates; the third—an encoding into standard, off-the-shelf answer set programming (ASP)—leads to significant speedups (∼ 9× geomean) over the state of the art while synthesizing higher quality programs. Our progression of solutions explores the space of interactions between SAT/SMT and Datalog, identifying ASP as a promising tool for working with and reasoning about Datalog. Along the way, we identify Datalog programs as monotonic SMT theories, which enjoy particularly efficient interactions in SMT; our plugins for popular SMT solvers make it easy to load an arbitrary Datalog program into the SMT solver as a custom monotonic theory. Finally, we evaluate our approaches using multiple underlying solvers to provide a more thorough and nuanced comparison against the current state of the art.
Satisfiability modulo theories (SMT) solving has become a critical part of many static analyses, including symbolic execution, refinement type checking, and model checking. We propose Formulog, a domain-specific language that makes it possible to write a range of SMT-based static analyses in a way that is both close to their formal specifications and amenable to high-level optimizations and efficient evaluation. Formulog extends the logic programming language Datalog with a first-order functional language and mechanisms for representing and reasoning about SMT formulas; a novel type system supports the construction of expressive formulas, while ensuring that neither normal evaluation nor SMT solving goes wrong. Our case studies demonstrate that a range of SMT-based analyses can naturally and concisely be encoded in Formulog, and that — thanks to this encoding — high-level Datalog-style optimizations can be automatically and advantageously applied to these analyses.
Satisfiability modulo theories (SMT) solving has become a critical part of many static analyses, including symbolic execution, refinement type checking, and model checking. We propose Formulog, a domain-specific language that makes it possible to write a range of SMT-based static analyses in a way that is both close to their formal specifications and amenable to high-level optimizations and efficient evaluation. Formulog extends the logic programming language Datalog with a first-order functional language and mechanisms for representing and reasoning about SMT formulas; a novel type system supports the construction of expressive formulas, while ensuring that neither normal evaluation nor SMT solving goes wrong. Our case studies demonstrate that a range of SMT-based analyses can naturally and concisely be encoded in Formulog, and that -- thanks to this encoding -- high-level Datalog-style optimizations can be automatically and advantageously applied to these analyses.
With its combination of Datalog, SMT solving, and functional programming, the language Formulog provides an appealing mix of features for implementing SMT-based static analyses (e.g., refinement type checking, symbolic execution) in a natural, declarative way. At the same time, the performance of its custom Datalog solver can be an impediment to using Formulog beyond prototyping—a common problem for Datalog variants that aspire to solve large problem instances. In this work we speed up Formulog evaluation, with some surprising results: while 2.2× speedups can be obtained by using the conventional techniques for high-performance Datalog (e.g., compilation, specialized data structures), the big wins come by abandoning the central assumption in modern performant Datalog engines, semi-naive Datalog evaluation. In the place of semi-naive evaluation, we develop eager evaluation, a concurrent Datalog evaluation algorithm that explores the logical inference space via a depth-first traversal order. In practice, eager evaluation leads to an advantageous distribution of Formulog’s SMT workload to external SMT solvers and improved SMT solving times: our eager evaluation extensions to the Formulog interpreter and Soufflé’s code generator achieve mean 5.2× and 7.6× speedups, respectively, over the optimized code generated by off-the-shelf Soufflé on SMT-heavy Formulog benchmarks. All in all, using compilation and eager evaluation (as appropriate), Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F ♯ , Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis. Moreover, our experience adds nuance to the conventional wisdom that traditional semi-naive evaluation is the one-size-fits-all best Datalog evaluation algorithm for static analysis workloads.
To handle AI tasks that combine perception and logical reasoning, recent work introduces Neurosymbolic Deep Neural Networks (NS-DNNs), which contain -- in addition to traditional neural layers -- symbolic layers: symbolic expressions (e.g., SAT formulas, logic programs) that are evaluated by symbolic solvers during inference. We identify and formalize an intuitive, high-level principle that can guide the design and analysis of NS-DNNs: symbol correctness, the correctness of the intermediate symbols predicted by the neural layers with respect to a (generally unknown) ground-truth symbolic representation of the input data. We demonstrate that symbol correctness is a necessary property for NS-DNN explainability and transfer learning (despite being in general impossible to train for). Moreover, we show that the framework of symbol correctness provides a precise way to reason and communicate about model behavior at neural-symbolic boundaries, and gives insight into the fundamental tradeoffs faced by NS-DNN training algorithms. In doing so, we both identify significant points of ambiguity in prior work, and provide a framework to support further NS-DNN developments.
Bitcoin is one of the first decentralized, peer to peer, payment systems based on the so-called Proof-of-Work (PoW).PoW is an algorithm that requires the computation of a hard function in order to gain access to a resource but, at the same time, the correctness of the computed result should be easily checked.The use of a PoW removes the necessity of a centralized third party and so the consistency of the network may be altered directly by the involved users.Peers, to solve the PoW more efficiently, usually organize themselves into mining pools, to increase the overall computational power: this situation, unfortunately, leads to a network centralization.In this paper we consider two typical scenarios of a Bitcoin network and we model them by probabilistic logic programming (PLP): the centralization of the hashing power by large pools and the "double spending attack".In the first one, we verify the effectiveness of a protocol that attempts to discourage the formation of large pools.In the second one, we compute the probability of success of an attacker.Both scenarios are modeled using the PLP package cplint.
Datalog has become a popular language for writing static analyses. Because Datalog is very limited, some implementations of Datalog for static analysis have extended it with new language features. However, even with these features it is hard or impossible to express a large class of analyses because they use logical formulae to represent program state. FormuLog fills this gap by extending Datalog to represent, manipulate, and reason about logical formulae. We have used FormuLog to implement declarative versions of symbolic execution and abstract model checking, analyses previously out of the scope of Datalog-based languages. While this paper focuses on the design of FormuLog and one of the analyses we have implemented in it, it also touches on a prototype implementation of the language and identifies performance optimizations that we believe will be necessary to scale FormuLog to real-world static analysis problems.
By combining Datalog, SMT solving, and functional programming, the language Formulog provides an appealing mix of features for implementing SMT-based static analyses (e.g., refinement type checking, symbolic execution) in a natural, declarative way. At the same time, the performance of its custom Datalog solver can be an impediment to using Formulog beyond prototyping -- a common problem for Datalog variants that aspire to solve large problem instances. In this work we speed up Formulog evaluation, with surprising results: while 2.2x speedups are obtained by using the conventional techniques for high-performance Datalog (e.g., compilation, specialized data structures), the big wins come by abandoning the central assumption in modern performant Datalog engines, semi-naive Datalog evaluation. In its place, we develop eager evaluation, a concurrent Datalog evaluation algorithm that explores the logical inference space via a depth-first traversal order. In practice, eager evaluation leads to an advantageous distribution of Formulog's SMT workload to external SMT solvers and improved SMT solving times: our eager evaluation extensions to the Formulog interpreter and Souffl\'e's code generator achieve mean 5.2x and 7.6x speedups, respectively, over the optimized code generated by off-the-shelf Souffl\'e on SMT-heavy Formulog benchmarks. Using compilation and eager evaluation, Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F#, Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis. Moreover, our experience adds nuance to the conventional wisdom that semi-naive evaluation is the one-size-fits-all best Datalog evaluation algorithm for static analysis workloads.