Identifying Cross-Version Function Similarity Using Contextual Features

2020 
The identification of similar functions in malware assists analysis by supporting the exclusion of functions that have been previously analysed, allows the identification of new variants, supports authorship attribution, and the analysis of malware phylogeny. A function's context is a set comprising the function itself and all the program functions that may be executed when this function is called. Contextual features consist of data that is extracted from the functions contained in the function context. This paper presents a novel technique called Cross Version Contextual Function Similarity (CVCFS) to identify function pairs in two programs using features based on both individual functions and function context. The CVCFS technique uses Support Vector Machine (SVM) machine learning of function similarity features to pre-filter function pairs and then applies an edit distance technique using function semantics to reduce false positives. A case study is provided where individual and contextual features are extracted from three versions of Zeus malware. The SVM pre-filtering, followed by the use of an edit distance technique to filter false positives, gives a function pair identification accuracy of 85 percent.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    2
    Citations
    NaN
    KQI
    []