Enhancing Multimodal Clustering Framework with Deep Learning to Reveal Image Spam Authorship

2021 
This paper introduces a multimodal framework for clustering spam images received in unsolicited emails. Spam images in the same cluster have similar visual and textual contents and could be generated by a common spam source. To perform the clustering task, we first extract three main categories of features: 1) Visual features, extracted by pretrained convolutional neural networks (CNNs); 2) Layout features, the location of illustrations in the spam images; 3) Text features extracted by optical character recognition (OCR) algorithm. We then use a two-stage hierarchical clustering framework to form clusters based on the pair-wise similarity matrices of the extracted features. We evaluate the performance of the proposed approach on a 2,100 spam image dataset collected from three months of emails. The experimental results show that the proposed method achieved satisfactory clustering outcomes in terms of an external entropy-based metric, the V-measure.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []