A Deep Learning Approach to the Malware Classification Problem using Autoencoders.

2019 
Detecting malicious code or categorizing it among families has become an increasingly difficult task. Malware1 exploits vulnerabilities and employ sophisticated techniques to avoid their detection and further classification, challenging cybersecurity teams, governments, enterprises, and the ordinary user, causing uncountable losses annually. Traditional machine learning algorithms have been used to attack the problem, although, these methods are heavily relying on domain expertise to be successful. Deep Learning methods requires less dependency on feature engineering, discovering the important features straightly from the raw data, recognizing patterns that humans usually can't. This work presents a deep learning approach for malware multi-class classification based on an unsupervised pre-trained classifier, using opcodes and its operands frequencies as raw data, ignoring knowledge that could be acquired from any known features from the malware families. The results confirmed that the approach is well succeeded and our best model achieved a MacroF1 of 93.14% a competitive result comparing to best-known classifier, since it uses less information about the malware.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    1
    Citations
    NaN
    KQI
    []