Signature limits: an entire map of clone features and their discovery in nearly linear time

2016 
We address an increasingly critical problem of identifying the potential signatures for identifying a given family of malware or unwanted software (i.e., or generally any corpus of artifacts of unknown provenance). We address this with a novel methodology designed to create an entire and complete maps of software code clones (copy features in data). We report on a practical methodology, which employs enhanced suffix data structures and partial orderings of clones to compute a compact representation of most interesting clones features in data. The enumeration of clone features is useful for malware triage and prioritization when human exploration, testing and verification is the most costly factor. We further show that the enhanced arrays may be used for discovery of provenance relations in data and we introduce two distinct Jaccard similarity coefficients to measure code similarity in binary artifacts. We illustrate the use of these tools on real malware data including a retro-diction experiment for measuring and enumerating evidence supporting common provenance in Stuxnet and Duqu. The results indicate the practicality and efficacy of mapping completely the clone features in data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    1
    Citations
    NaN
    KQI
    []