Towards Unknown Traffic Identification via Embeddings and Deep Autoencoders

2019 
Traffic classification, as a fundamental tool for network management and security, is suffering from a critical problem, namely “unknown traffic”. The unknown traffic is defined as network traffic generated by previously unknown applications (i.e., zero-day applications) in a traffic classification system. The ability to divide the mixed unknown traffic into clusters, each of which contains only one application traffic as far as possible, is the key to solve this problem. This paper reports our recent exploration of the n-gram embeddings strategy, deep neural networks and clustering algorithms for constructing an unsupervised scheme for unknown network traffic identification. Experimental results on real-world traces demonstrate that our method gains average clustering purity rate about 97.35% when we use DNS, DHCP, BitTorrent, SSH, HTTP, IMAP, MySQL, and Github to simulate unknown traffic.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    6
    Citations
    NaN
    KQI
    []