Network traffic classification plays a significant role in cyber security applications and management scenarios. Conventional statistical classification techniques rely on the assumption that clean labelled samples are available for building classification models. However, in the big data era, mislabelled training data commonly exist due to the introduction of new applications and lack of knowledge. Existing statistical traffic classification techniques do not address the problem of mislabelled training data, so their performance become poor in the presence of mislabelled training data. To meet this challenge, in this paper, we propose a new scheme, Noise-resistant Statistical Traffic Classification (NSTC), which incorporates the techniques of noise elimination and reliability estimation into traffic classification. NSTC estimates the reliability of the remaining training data before it builds a robust traffic classifier. Through a number of traffic classification experiments on two real-world traffic data sets, the results show that the new NSTC scheme can effectively address the problem of mislabelled training data. Compared with the state of the art methods, NSTC can significantly improve the classification performance in the context of big unclean data.
How to provide cost-effective strategies for Software Testing has been one of the research focuses in Software Engineering for a long time. Many researchers in Software Engineering have addressed the effectiveness and quality metric of Software Testing, and many interesting results have been obtained. However, one issue of paramount importance in software testing --the intrinsic imprecise and uncertain relationships within testing metrics --is left unaddressed. To this end, a new quality and effectiveness measurement based on fuzzy logic is proposed. The software quality features and analogy-based reasoning are discussed, which can deal with quality and effectiveness consistency between different test projects. Experimental results are also provided to verify the proposed measurement.
An increasingly popular and promising way for complex disease diagnosis is to employ artificial neural networks (ANN). Single nucleotide polymorphisms (SNP) data from individuals is used as the inputs of ANN to find out specific SNP patterns related to certain disease. Due to the large number of SNPs, it is crucial to select optimal SNP subset and their combinations so that the inputs of ANN can be reduced. With this observation in mind, a hybrid approach - a combination of genetic algorithms (GA) and ANN (called GANN) is used to automatically determine optimal SNP set and optimize the structure of ANN. The proposed GANN algorithm is evaluated by using both a synthetic dataset and a real SNP dataset of a complex disease.
Many organizations struggle with the massive amount of data they collect. Today, data does more than serve as the ingredients for churning out statistical reports. They help support efficient operations in many organizations, and to some extent, data provide the competitive intelligence organizations need to survive in today's economy. Data mining can't always deliver timely and relevant results because data are constantly changing. However, stream-data processing might be more effective, judging by the Matrix project.
Traditional traffic information acquisition and acquisition are mainly implemented by sensors, and these traditional acquisition and acquisition systems have some great drawbacks. However, with the popularity of traffic monitoring, computer vision technology gradually has a platform foundation that can be applied to identify and track traffic conditions. In this paper, through the research of traditional and Deep learning-based multi-target recognition algorithms and two common multi-target tracking algorithms, a solution of YOLO v3 network combined with deep-sort algorithm is proposed. In this paper, a video of traffic information of urban roads is directly collected for areas with relatively large traffic flow. Interval frames are extracted from the video data set to make relevant data sets for training and verification of YOLO v3 neural networks. Combined with the test results, an open source vehicle depth model dataset is used to train the vehicle depth feature weight file, and Deep-SORT algorithm is used to achieve the target tracking, which can realize the real-time and more accurate multi-target recognition and tracking of moving vehicles.