Feature selection for image spam classification

Qiao Liu,Fengli Zhang,Zhiguang Qin,Chao Wang,Shuang Chen,Qiuming Ma

Feature selection for image spam classification

2010

Qiao Liu
Fengli Zhang
Zhiguang Qin
Chao Wang
Shuang Chen
Qiuming Ma

This paper considers the low-level feature modeling problem in image spam classification, in which most of the prevalent content based spam filters are shown to be inefficient because their OCR procedure are vulnerable to text obscuring attacks from spammers. We first built up a basic feature set through a low-level feature extraction process, and then proposed a stepwise regression method to determine the best subset automatically, which was controlled by a minimum description length criterion. Experimental results indicate that the proposed approach is very effective for the purpose of modeling spam images, and the selected feature set is applicable for practical anti-spam tasks, its performance is comparable to some other cutting-edge approaches.

Keywords:

Computer science
Minimum description length
Feature selection
Histogram
Contextual image classification
Feature (computer vision)
Image spam
Feature extraction
Stepwise regression
Machine learning
Artificial intelligence
Pattern recognition
electronic mail
minimum description length criterion
Data mining

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations