RF-AdaCost: WebShell Detection Method that Combines Statistical Features and Opcode

2020 
WebShell is called a webpage backdoor. After hackers invade a website, they usually mix backdoor files with normal webpage files in the WEB directory of the website service area. Then, they use a browser to access the backdoor and obtain a command execution environment to control the website server. WebShell detection methods have stringent requirements because of the flexibility of the PHP language and the increasing number of hidden techniques used by hackers. The term frequency–inverse document frequency (TF-IDF) used in the existing random forest–gradient boosting decision tree (RF-GBDT) algorithm does not consider the distribution information and classification capabilities of feature words among classes, and no balance exists between false negative and false positive rates. This work proposes a PHP WebShell detection model called RF-AdaCost, which stands for random forest–misclassification cost-sensitive AdaBoost, based on RF-GBDT. We used the statistical characteristics of PHP source files, including information entropy and index of coincidence, and extracted the opcode sequences of PHP source files, thus merging statistical features and opcode sequences to improve the detection efficiency of the WebShell. Experimental results show that the RF-AdaCost algorithm demonstrates better performance than the RF-GBDT algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    2
    Citations
    NaN
    KQI
    []