Exploring Malicious Webpages Using Machine Learning Concept

2021 
Malicious web page identification emerges to be an unavoidable process in the digital era. With increase in usage of various digital devices with applications from lot of third party service providers, classification of a web page into a legitimate or malicious is a most demanding need of the day. The parameter considered for this classification is the URL of the webpage. The proposed solution uses TF-IDF score to do this task. This score would be incorporated with existing machine learning model such as linear regression and different classifiers and a comparative analysis will be made to choose the better one. In addition to these lists called white list could be maintained. The white list has the list of URLs, which has been classified as legitimate earlier. When a new URL is seen by the system, it would check for its availability in the white list. If the URL is not found it would be fed into our machine learning algorithm for classification. This work considers logistic regression along with comparison using random forest and support vector machine. The comparative analysis shows that, logistic regression seems to produce optimal results when compared with SVM and random forest algorithm in terms of identifying false rates. Also, from the analysis results it is observed that, Term Frequency Inverse Document Frequency (TF-IDF) along with binary classifiers on white list works efficiently.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []