From big data to knowledge: A spatio-temporal approach to malware detection

2017 
Abstract The deployment of endpoint protection has been gradually migrated from individual clients to remote cloud servers, which is termed as cloud based security service. The new paradigm of security defense produces a large amount of data and log files, and motivates data-driven techniques for detecting malicious software. This paper conducts an empirical study on the log of a real cloud based security service to characterize the occurrence of executable files in end hosts, which concerns 124,782 benign and 113,305 malicious executable files occurred in 165,549,417 end hosts. The end hosts and the timestamps that an executable file occurs in provide insights into the distribution of software in wild from spatial and temporal perspectives, respectively. Meanwhile, we investigate the strategies behind the characterizations, and observe the preferential attachment process and the periodicity of file occurrence in end hosts. The observed different occurrence patterns of benign and malicious files in end hosts inspire us a new scalable approach to malware detection. We learn from the characterizations that, the associated files shared more spatial and temporal information in common are more likely to be same in their labels, either benign or malicious. Thus, we devise a graph based semi-supervised learning algorithm for real-time malware detection by taking into account the spatio-temporal information of the distribution of executable files. Experimental results demonstrate that our approach increases the performance on malware detection by 14.7% over previous techniques on average.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    4
    Citations
    NaN
    KQI
    []