Towards classification of email through selection of informative features

2020 
Dodging spam filters by spammers has become a serious issue in classifying emails as spam or ham. An extremely high dimensionality of emails sent/received is one of the major characteristic of email classification. To train a classifier without decreasing its capability of prediction is important in classification of emails. Feature selection method plays an important role in identifying most relevant features to classify emails by inventing new techniques. The features are extracted based on the relationship established between the words of various classes and it helps to increase the probability of the words to represent informative features. The words can be either of positive or negative in nature. In this paper, relationship among the words present in subject and content of emails has been used to determine the nature of the word and then selected the most related words to form informative features from set of all words. These words are then used to generate the N-grams. Four different classifiers namely Decision tree, Multinomial Naive Bayes, Random Forest classifiers, Linear Support Vector Machine classifiers have been used to determine the performance of the selected N-Grams features. The experimental analysis has been performed over Ling-Spam email dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    1
    Citations
    NaN
    KQI
    []