Hierarchical Clustering Based Network Traffic Data Reduction for Improving Suspicious Flow Detection

2018 
Attacks like APT have lasted for a long time which need suspicious flow detection on long-time data. However, the challenge of effectively analyzing massive data source for suspicious flow diagnosis is unmet yet. Consequently, flow data reduction should be adopted, which refers to abstract the most relevant information from the massive dataset. Existing approaches to sampling flow data are inherently inaccurate unless running at high sampling rate. In this paper, we proposed HCBS (Hierarchical Clustering Based Sampling), a flow data reduction scheme, to alleviate such problems. We study the characteristics of flow data relating malicious activities and employ hierarchical clustering to sample data for further deep detection. Experiments on 1999 DARPA dataset demonstrates that HCBS reduces the size of the flow data by 40% with only a small loss in accuracy and significantly outperforms the compared state-of-the-art.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    11
    Citations
    NaN
    KQI
    []