Using Bipartite Anomaly Features for Cyber Security Applications

2015 
In this paper we use anomaly scores derived from a technique for bipartite graphs as features for a supervised machine learning algorithm for two cyber security problems: classifying Short Message Service (SMS) text messages as either spam or non-spam and detecting malicious lateral movement within a network. While disparate problems, both spam and lateral movement detection can be viewed as bipartite graphs and we can compute bipartite anomaly scores for each situation. The bipartite anomaly scores by themselves are not very predictive, but used as auxiliary features can boost the receiver operating characteristic (ROC) curve of a supervised classifier. We examine the UCI SMS Spam Collection Data Set for the SPAM problem and use an authentication graph from Los Alamos National Laboratory. We create features by dimensionality reduction through principal component analysis (PCA) on the message-term or user-computer matrix, and then augment those features with anomaly scores. By using the anomaly scores we are able to improve the area under the curve (AUC) for the receiver operating characteristic (ROC) up to 27.5% for the spam data and 21.4% for the authentication data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    7
    Citations
    NaN
    KQI
    []