Practical dynamic reconstruction of control flow graphs

2020 
The automatic recovery of a program’s high-level representation from its binary version is a well-studied problem in programming languages. However, most of the solutions to this problem are based on purely static approaches: techniques such as dataflow analyses or type inference are used to convert the bytes that constitute the executable code back into a control flow graph (CFG). This work departs from such a modus operandi to show that a dynamic analysis can be effective and useful, both as a standalone technique, and as a way to enhance the precision of static approaches. The experimental results provide evidence that completeness, i.e., the ability to conclude that the entire CFG has been discovered, is achievable on many functions that are part of industry-strong benchmarks. Experiments also indicate that dynamic information greatly enhances the ability of DynInst, a state-of-the-art binary reconstructor, to deal with code stripped of debugging information. These results were obtained with CFGgrind, a new implementation of a dynamic code reconstructor, built on top of valgrind. When applied to cBench, CFGgrind is 9% faster than callgrind, valgrind’s tool used to track targets of function calls; and 7% faster in Spec Cpu2017. CFGgrind recovers the complete CFG of 40% of all the procedures invoked during the standard execution of programs in Spec Cpu2017, and 37% in cBench. When combined with CFGgrind, DynInst finds 15% more CFGs for cBench, and 7% more CFGs for Spec Cpu2017. Finally, CFGgrind is more than 7 times faster than DCFG, a CFG reconstructor from Intel, and 1.28 times faster than bfTrace, a CFG reconstructor used in research. CFGgrind is also more precise than these two tools, handling operating system signals, shared code in functions, and unaligned instructions; besides supporting multi-threaded programs, exact profiling and incremental refinements.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    3
    Citations
    NaN
    KQI
    []