A Fast Algorithm for Constructing Phylogenetic Trees with Application to IoT Malware Clustering.

2019 
For efficiently handling thousands of malware specimens, we aim to quickly and automatically categorize those into malware families. A solution for this could be the neighbor-joining method using NCD (Normalized Compression Distance) as similarity of malware. It creates a phylogenetic tree of malware based on the NCDs between malware binaries for clustering. However, it is frustratingly slow because it requires \((N^2+N)/2\) compression attempts for the NCDs, where N is the number of given specimens. For fast clustering, this paper presents an algorithm for efficiently constructing a phylogenetic tree by greatly reducing compression attempts. The key idea to do so is not to construct a tree of N specimens all at once. Instead, it divides N specimens into temporal clusters in advance, constructs a small tree for each temporal cluster, and joins the trees as a united tree. Intuitively, separately constructing small trees requires a much smaller number of compression attempts than \((N^2+N)/2\). With experiments using 4,109 in-the-wild malware specimens, we confirm that our algorithm achieved clustering 22 times faster than the neighbor-joining method with a good accuracy of 97%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []