Aaron Bembenek

University of Melbourne

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Stephen Chong

Harvard University Press

Michael Greenberg

Stevens Institute of Technology

Amílcar Cardoso

University of Coimbra

Pedro Martins

University of Coimbra

Toby Murray

University of Melbourne

Nada Amin

Harvard University Press

João Carlos Gonçalves

Polytechnic Institute of Viseu

Michael Ballantyne

Northeastern University

Cooperative Institutions

University of Coimbra

University of Minho

Harvard University Press

Harvard University

Polytechnic Institute of Viseu

SRI International

Software Engineering Institute

Georgia Institute of Technology

Microsoft Research (United Kingdom)

KU Leuven

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

From SMT to ASP: Solver-Based Approaches to Solving Datalog Synthesis-as-Rule-Selection Problems

Proceedings of the ACM on Programming Languages (2023)

Aaron Bembenek Michael Greenberg Stephen Chong

Given a set of candidate Datalog rules, the Datalog synthesis-as-rule-selection problem chooses a subset of these rules that satisfies a specification (such as an input-output example). Building off prior work using counterexample-guided inductive synthesis, we present a progression of three solver-based approaches for solving Datalog synthesis-as-rule-selection problems. Two of our approaches offer some advantages over existing approaches, and can be used more generally to solve arbitrary SMT formulas containing Datalog predicates; the third—an encoding into standard, off-the-shelf answer set programming (ASP)—leads to significant speedups (∼ 9× geomean) over the state of the art while synthesizing higher quality programs. Our progression of solutions explores the space of interactions between SAT/SMT and Datalog, identifying ASP as a promising tool for working with and reasoning about Datalog. Along the way, we identify Datalog programs as monotonic SMT theories, which enjoy particularly efficient interactions in SMT; our plugins for popular SMT solvers make it easy to load an arbitrary Datalog program into the SMT solver as a custom monotonic theory. Finally, we evaluate our approaches using multiple underlying solvers to provide a more thorough and nuanced comparison against the current state of the art.

Datalog

Solver

Satisfiability Modulo Theories

Answer Set Programming

Deductive database

10.1145/3571200

Cite

Citations (2)

Going into Greater Depth in the Quest for Hidden Frames.

ICCC (2019)

João Carlos Gonçalves Aaron Bembenek Pedro Martins Amílcar Cardoso

Source

Cite

Citations (0)

Formulog: Datalog for SMT-based static analysis

Proceedings of the ACM on Programming Languages (2020)

Aaron Bembenek Michael Greenberg Stephen Chong

Satisfiability modulo theories (SMT) solving has become a critical part of many static analyses, including symbolic execution, refinement type checking, and model checking. We propose Formulog, a domain-specific language that makes it possible to write a range of SMT-based static analyses in a way that is both close to their formal specifications and amenable to high-level optimizations and efficient evaluation. Formulog extends the logic programming language Datalog with a first-order functional language and mechanisms for representing and reasoning about SMT formulas; a novel type system supports the construction of expressive formulas, while ensuring that neither normal evaluation nor SMT solving goes wrong. Our case studies demonstrate that a range of SMT-based analyses can naturally and concisely be encoded in Formulog, and that — thanks to this encoding — high-level Datalog-style optimizations can be automatically and advantageously applied to these analyses.

Datalog

Satisfiability Modulo Theories

Satisfiability

10.1145/3428209

Cite

Citations (22)

Formulog: Datalog for SMT-Based Static Analysis (Extended Version)

arXiv (Cornell University) (2020)

Aaron Bembenek Michael Greenberg Stephen Chong

Satisfiability modulo theories (SMT) solving has become a critical part of many static analyses, including symbolic execution, refinement type checking, and model checking. We propose Formulog, a domain-specific language that makes it possible to write a range of SMT-based static analyses in a way that is both close to their formal specifications and amenable to high-level optimizations and efficient evaluation. Formulog extends the logic programming language Datalog with a first-order functional language and mechanisms for representing and reasoning about SMT formulas; a novel type system supports the construction of expressive formulas, while ensuring that neither normal evaluation nor SMT solving goes wrong. Our case studies demonstrate that a range of SMT-based analyses can naturally and concisely be encoded in Formulog, and that -- thanks to this encoding -- high-level Datalog-style optimizations can be automatically and advantageously applied to these analyses.

Datalog

Satisfiability Modulo Theories

Satisfiability

10.48550/arxiv.2009.08361

Cite

Citations (0)

Making Formulog Fast: An Argument for Unconventional Datalog Evaluation

Proceedings of the ACM on Programming Languages (2024)

Aaron Bembenek Michael Greenberg Stephen Chong

With its combination of Datalog, SMT solving, and functional programming, the language Formulog provides an appealing mix of features for implementing SMT-based static analyses (e.g., refinement type checking, symbolic execution) in a natural, declarative way. At the same time, the performance of its custom Datalog solver can be an impediment to using Formulog beyond prototyping—a common problem for Datalog variants that aspire to solve large problem instances. In this work we speed up Formulog evaluation, with some surprising results: while 2.2× speedups can be obtained by using the conventional techniques for high-performance Datalog (e.g., compilation, specialized data structures), the big wins come by abandoning the central assumption in modern performant Datalog engines, semi-naive Datalog evaluation. In the place of semi-naive evaluation, we develop eager evaluation, a concurrent Datalog evaluation algorithm that explores the logical inference space via a depth-first traversal order. In practice, eager evaluation leads to an advantageous distribution of Formulog’s SMT workload to external SMT solvers and improved SMT solving times: our eager evaluation extensions to the Formulog interpreter and Soufflé’s code generator achieve mean 5.2× and 7.6× speedups, respectively, over the optimized code generated by off-the-shelf Soufflé on SMT-heavy Formulog benchmarks. All in all, using compilation and eager evaluation (as appropriate), Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F ♯ , Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis. Moreover, our experience adds nuance to the conventional wisdom that traditional semi-naive evaluation is the one-size-fits-all best Datalog evaluation algorithm for static analysis workloads.

Datalog

Argument (complex analysis)

10.1145/3689754

Cite

Citations (0)

Symbol Correctness in Deep Neural Networks Containing Symbolic Layers

arXiv (Cornell University) (2024)

Aaron Bembenek Toby Murray

To handle AI tasks that combine perception and logical reasoning, recent work introduces Neurosymbolic Deep Neural Networks (NS-DNNs), which contain -- in addition to traditional neural layers -- symbolic layers: symbolic expressions (e.g., SAT formulas, logic programs) that are evaluated by symbolic solvers during inference. We identify and formalize an intuitive, high-level principle that can guide the design and analysis of NS-DNNs: symbol correctness, the correctness of the intermediate symbols predicted by the neural layers with respect to a (generally unknown) ground-truth symbolic representation of the input data. We demonstrate that symbol correctness is a necessary property for NS-DNN explainability and transfer learning (despite being in general impossible to train for). Moreover, we show that the framework of symbol correctness provides a precise way to reason and communicate about model behavior at neural-symbolic boundaries, and gives insight into the fundamental tradeoffs faced by NS-DNN training algorithms. In doing so, we both identify significant points of ambiguity in prior work, and provide a framework to support further NS-DNN developments.

Symbol (formal)

The Symbolic

10.48550/arxiv.2402.03663

Cite

Citations (0)

Proceedings 36th International Conference on Logic Programming (Technical Communications)

Electronic Proceedings in Theoretical Computer Science (2020)

Aaron Bembenek Michael Ballantyne Michael Greenberg Nada Amin

Bitcoin is one of the first decentralized, peer to peer, payment systems based on the so-called Proof-of-Work (PoW).PoW is an algorithm that requires the computation of a hard function in order to gain access to a resource but, at the same time, the correctness of the computed result should be easily checked.The use of a PoW removes the necessity of a centralized third party and so the consistency of the network may be altered directly by the involved users.Peers, to solve the PoW more efficiently, usually organize themselves into mining pools, to increase the overall computational power: this situation, unfortunately, leads to a network centralization.In this paper we consider two typical scenarios of a Bitcoin network and we model them by probabilistic logic programming (PLP): the centralization of the hashing power by large pools and the "double spending attack".In the first one, we verify the effectiveness of a protocol that attempts to discourage the formation of large pools.In the second one, we compute the probability of success of an attacker.Both scenarios are modeled using the PLP package cplint.

10.4204/eptcs.325.7

Cite

Citations (0)

FormuLog: Datalog for static analysis involving logical formulae

arXiv (Cornell University) (2018)

Aaron Bembenek Stephen Chong

Datalog has become a popular language for writing static analyses. Because Datalog is very limited, some implementations of Datalog for static analysis have extended it with new language features. However, even with these features it is hard or impossible to express a large class of analyses because they use logical formulae to represent program state. FormuLog fills this gap by extending Datalog to represent, manipulate, and reason about logical formulae. We have used FormuLog to implement declarative versions of symbolic execution and abstract model checking, analyses previously out of the scope of Datalog-based languages. While this paper focuses on the design of FormuLog and one of the analyses we have implemented in it, it also touches on a prototype implementation of the language and identifies performance optimizations that we believe will be necessary to scale FormuLog to real-world static analysis problems.

Datalog

Scope (computer science)

Implementation

Deductive database

Logical analysis

10.48550/arxiv.1809.06274

Cite

Citations (0)

Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version)

arXiv (Cornell University) (2024)

Aaron Bembenek Michael Greenberg Stephen Chong

By combining Datalog, SMT solving, and functional programming, the language Formulog provides an appealing mix of features for implementing SMT-based static analyses (e.g., refinement type checking, symbolic execution) in a natural, declarative way. At the same time, the performance of its custom Datalog solver can be an impediment to using Formulog beyond prototyping -- a common problem for Datalog variants that aspire to solve large problem instances. In this work we speed up Formulog evaluation, with surprising results: while 2.2x speedups are obtained by using the conventional techniques for high-performance Datalog (e.g., compilation, specialized data structures), the big wins come by abandoning the central assumption in modern performant Datalog engines, semi-naive Datalog evaluation. In its place, we develop eager evaluation, a concurrent Datalog evaluation algorithm that explores the logical inference space via a depth-first traversal order. In practice, eager evaluation leads to an advantageous distribution of Formulog's SMT workload to external SMT solvers and improved SMT solving times: our eager evaluation extensions to the Formulog interpreter and Souffl\'e's code generator achieve mean 5.2x and 7.6x speedups, respectively, over the optimized code generated by off-the-shelf Souffl\'e on SMT-heavy Formulog benchmarks. Using compilation and eager evaluation, Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F#, Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis. Moreover, our experience adds nuance to the conventional wisdom that semi-naive evaluation is the one-size-fits-all best Datalog evaluation algorithm for static analysis workloads.

Datalog

Argument (complex analysis)

10.48550/arxiv.2408.14017

Cite

Citations (0)