Feature selection for image spam classification

2010 
This paper considers the low-level feature modeling problem in image spam classification, in which most of the prevalent content based spam filters are shown to be inefficient because their OCR procedure are vulnerable to text obscuring attacks from spammers. We first built up a basic feature set through a low-level feature extraction process, and then proposed a stepwise regression method to determine the best subset automatically, which was controlled by a minimum description length criterion. Experimental results indicate that the proposed approach is very effective for the purpose of modeling spam images, and the selected feature set is applicable for practical anti-spam tasks, its performance is comparable to some other cutting-edge approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    2
    Citations
    NaN
    KQI
    []