Towards Unknown Traffic Identification Using Deep Auto-Encoder and Constrained Clustering

2019 
Nowadays, network traffic identification, as the fundamental technique in the field of cybersecurity, suffers from a critical problem, namely “unknown traffic”. The unknown traffic refers to network traffic generated by previously unknown applications (i.e., zero-day applications) in a pre-constructed traffic classification system. The ability to divide the mixed unknown traffic into multiple clusters, each of which contains only one application traffic as far as possible, is the key to solve this problem. In this paper, we propose the DePCK to improve the clustering purity. There are two main innovations in our framework: (i) It learns to extract bottleneck features via deep auto-encoder from traffic statistical characteristics; (ii) It uses the flow correlation to guide the process of pairwise constrained k-means. To verify the effectiveness of our framework, we make contrast experiments on two real-world datasets. The experimental results show that the clustering purity rate of DePCK can exceed 94.81% on the ISP-data and 91.48% on the WIDE-data [1], which outperform the state-of-the-art methods: RTC [20], and k-means with log data [15].
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    5
    Citations
    NaN
    KQI
    []