A Biosequence-Based Approach to Software Characterization

Christopher S. Oehmen,Elena S. Peterson,Aaron R. Phillips,Darren S. Curtis

A Biosequence-Based Approach to Software Characterization

2016

For many applications, it is desirable to have a process for recognizing when software binaries are closely related without relying on them to be identical or have identical segments. But doing so in a dynamic environment is a nontrivial task because most approaches to software similarity require extensive and time-consuming analysis of a binary, or they fail to recognize executables that are similar but not identical. Presented herein is a novel biosequence-based method for quantifying similarity of executable binaries. Using this method, we show in an example application on large-scale multi-author codes that 1) the biosequence-based method has a statistical performance in recognizing and distinguishing between a collection of real-world high performance computing applications better than 90% of ideal, and 2) an example of using family-tree analysis to tune identification for a code subfamily can achieve better than 99% of ideal performance.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations