VILO: a rapid learning nearest-neighbor classifier for malware triage

2013 
VILO is a lazy learner system designed for malware classification and triage. It implements a nearest neighbor (NN) algorithm with similarities computed over Term Frequency $$\times $$ Inverse Document Frequency (TFIDF) weighted opcode mnemonic permutation features (N-perms). Being an NN-classifier, VILO makes minimal structural assumptions about class boundaries, and thus is well suited for the constantly changing malware population. This paper presents an extensive study of application of VILO in malware analysis. Our experiments demonstrate that (a) VILO is a rapid learner of malware families, i.e., VILO's learning curve stabilizes at high accuracies quickly (training on less than 20 variants per family is sufficient); (b) similarity scores derived from TDIDF weighted features should primarily be treated as ordinal measurements; and (c) VILO with N-perm feature vectors outperforms traditional N-gram feature vectors when used to classify real-world malware into their respective families.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    26
    Citations
    NaN
    KQI
    []