The Challenge of Only One Flow Problem for Traffic Classification in Identity Obfuscation Environments

Hong-Yen Chen,Tsung-Nan Lin

The Challenge of Only One Flow Problem for Traffic Classification in Identity Obfuscation Environments

2021

As encrypted traffic grows, network flow classification has become a significant issue because of the impossibility to parse the payload in an encrypted packet. A possible packet sniffing location for organizations is an under control gateway between intranet and internet to inspect network traffic. However, when an intranet user uses an identity obfuscation protocol such as VPN or TOR, the packet IP and port would be rewritten to preserve user privacy. The same user’s packet sniffed between a user and TOR entry node/VPN proxy always has the same 5-tuples (packets with the same source IP, destination IP, source port, destination port, and IP protocol defined as flow). Thus, we cannot rely on the 5-tuples rule to split traffic into flows. This challenge is called the “only one flow problem” and poses an obstacle for flow classification. A previous solution uses timeout value to determine flow separation points to address this issue. However, the predefined static time threshold cannot fit all user habits, which leads to separation errors. To overcome timeout limitations, we propose a flexible method called AI-FlowDet by leveraging the scene change concept and a CNN model to find behavior change points of traffic based on learning data. AI-FlowDet can apply to the traffic of the identity obfuscation protocols. Next, we propose 294 size-based and direction-based features that can be used with AI-FlowDet to evaluate flow type classification performance. Every experiment leverages different machine learning algorithms. The results show that AI-FlowDet with the proposed features can achieve 98.5% weighted accuracy, which is increased by 12.6% versus the previous timeout method with baseline features. We proved that the proposed splitting methods for the only one flow problem and proposed features for flow type classification are effective based on the good results obtained for both the VPN and TOR datasets.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations