PacketCGAN: Exploratory Study of Class Imbalance for Encrypted Traffic Classification Using CGAN

2020 
With the popularity of Deep Learning (DL), researchers have begun to apply DL to tackle with encrypted traffic classification problems. Although these methods can automatically extract traffic features to improve the ability of feature engineering of traditional methods like DPI, a large amount of data is still needed to learn the characteristics of various types of traffic. Therefore, the performance of classification model always significantly depends on the quality of datasets. Nonetheless, the building of datasets is a time-consuming and costly task, especially encrypted traffic. Apparently, it is often more difficult to collect a large amount of traffic samples of those unpopular applications than well-known ones, which often leads to the problem of class imbalance between major and minor encrypted applications in datasets. In this paper, we proposed a novel traffic data augmenting method called PacketCGAN using Conditional GAN, which can control the modes of data to be generated. PacketCGAN exploit the benefit of CGAN to generate specified samples with the input of applications' types as conditional and thereby achieve data balancing. As a proof of concept, three classical DL models including CNN were adopted to classify four types of encrypted traffic datasets augmented by Random Over Sampling (ROS), SMOTE(Synthetic Minority Over-sampling Techinique), vanilla GAN and PacketCGAN respectively using public datasets. The experimental evaluation results demonstrate that DL based encrypted traffic classifier over our new dataset augmented by PacketCGAN can achieve better performance than the other three in terms of encrypted traffic classification.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    15
    Citations
    NaN
    KQI
    []