language-icon Old Web
English
Sign In

Upcompiling Legacy Code to Java

2012 
This thesis investigates the process of “upcompilation”, the transformation of a binary program back into source code. Unlike a decompiler, the resulting code is in a language with higher abstraction than the original source code was originally written in. Thus, it supports the migration of legacy applications with missing source code to a virtual machine. The result of the thesis is a deeper understanding of the problems occurring in upcompilers. To identify the problems, we wrote an upcompiler which transforms simple x86 binary programs to Java source code. We recover local variables, function arguments and return values from registers and memory. The expression reduction phase reduces the amount of variables. We detect calls to library functions and translate memory allocation and basic input/output operations to Java constructs. The structuring phase transforms the control flow graph to an abstract syntax tree. We type the variables to integers and pointers to integer. In order to optimize the produced code for readability, we developed a data flow aware coalescing algorithm. The discovered obstacles include type recovery, structuring, handling of obfuscated code, pointer representation in Java, and optimization for readability, to only name a few. For most of them we refer to related literature. We show that upcompilation is possible and where the problems are. More investigation and implementation effort is needed to tackle specific problems and to make upcompilation applicable for real world programs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []