Addressing the train-test gap on traffic classification combined subflow model with ensemble learning

2020 
Abstract Previous machine learning-based network traffic classification approaches hold the assumption that training and testing network environment are of the same. This assumption is invalid in most real cases due to the changes in traffic features and leads to the train–test gap issue: the model trained in the training environment performs poorly in the testing environment. In this paper, to address the gap, we propose CSA: a traffic classification approach based on packet-wise segmentation and aggregation. Firstly, we observe that some specific fragments of network flows – subflows – are robust against the gap. Therefore, we are motivated to segment the traffic flows into different subflows. Afterward, with the justification of our feature selection, 26 statistical features are extracted from each subflow and input into its corresponding sub-classifier. Secondly, with the results from sub-classifiers, we develop an aggregation method based on their classification accuracy to increase the overall classification performance. We experiment on five real datasets, including three collected from the Northwest Center of CERNET (China Education and Research Network) and two from public traces. By comparing with state-of-the-art baselines, the experiment results demonstrate the effectiveness of our CSA against the gap.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    48
    References
    1
    Citations
    NaN
    KQI
    []