MEADE: Towards a Malicious Email Attachment Detection Engine

2018 
Email attachments are a growing delivery vector for malware. While machine learning (ML) has been successfully applied to portable executable (PE) malware detection, we ask, can we extend static ML approaches to detect malware across common email attachment file types, e.g., office documents and Zip archives? To this end, we collected a dataset of over 5 million malicious/benign Microsoft Office documents along with a smaller data set, which we use to provide more realistic estimates of thresholds for false positive rates on in-the-wild data. We also collected a dataset of approximately 500k malicious/benign Zip archives on which we performed a separate evaluation. We analyzed predictive performance using 70/30 train/test time splits, evaluating feature and classifier types that have been applied successfully in commercial PE antimalware products and R &D contexts. Using deep neural networks and gradient boosted decision trees, we are able to obtain ROC curves with > 0.99 AUC on both office document and Zip archive datasets. Discussion of deployment viability in various antimalware contexts is provided.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    12
    Citations
    NaN
    KQI
    []