Malware Detection Based on New Implementations of the Moody-Darken Single-Layer Perceptron Architecture: When the Data Speak, Are We Listening?

2016 
Malware detection is a very important cyber security problem as it compromises computer system integrity and allows the collection of sensitive information or the insertion of disruptive malicious, and intrusive software. Malware is within the domain of cyber security and has become more important with the burgeoning of advanced technologies applied to cyber attacks and people eager to use that technology. We approach malware detection first as a binary classification problem, i.e., one class for malware and another for non-malware. We present a novel classifier that utilizes constrained low rank approximation as the core algorithm innovation generalizing the Moody-Darken single layer perceptron architecture of 1989, which we call the Generalized Moody-Darken Architecture (GMDA). We formulate the new algorithm as a nonconex optimization problem for the hidden layer of the single layer perceptron and derive a constrained convex optimization problem for the output layer estimator. Our previous results have shown that the combined architecture achieves the classification performance of the support vector machine (SVM), but in an online methodology that scales well to massive-scale data. In addition, our new implementation works well for nonnegative data and has been applied to Twitter data sentiment classification as well. We focus this paper on solving the classification problem in the appropriate domain for the data and show that this is critical for both accuracy and interpretation of the results. Also, we demonstrate that the data generation process also should be appropriate for the selected algorithms. All this has critical implications for design of the GMDA. In this paper, we introduce a new classification framework based on our novel implementation of the Moody-Darken architecture that is fast and semi-adaptive in the sense that the hidden layer utilizes a warm-start method for the non-convex optimization problem and is not required to be fully adaptive, while the output layer is fully adaptive and can be updated/downdated for each new input sample. After warm-starting the hidden layer, the output layer can be updated with new inputs independently of the hidden layer. We will compare our new approach with the much-used SVM method to validate and test our model in terms of classification accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []