Detecting DDoS attack based on PSO Clustering algorithm

2016 
First, this article analyzes the Application layer Distributed Denial of Service(DDoS)’s attack principle and characteristic. According to the difference between normal users’ browsing patterns and abnormal ones, user sessions are extracted from the web logs of normal users and similarities between different sessions are calculated .Because traditional K-mean Clustering algorithm is easy to fail into local optimal, the Particle Swarm Optimization K-mean Clustering algorithm is used to generate a detecting model. This model can been used to detect whether the undetermined sessions are DDoS attacks or not. The experiment show that this method can detect attacks effectively and have a good performance in adaptability. Introduction Distributed denial of service attacks is one of the major threats to the security of the Internet, which in the absence of any warning consume resources of the target,it can be made at the network layer or application layer.Application layer DDoS have two attack methods: bandwidth depletion mode and the host resource depletion mode. At present, methods to solve these similar problem including: Intrusion detection technology based on data packetDetection method based on flow limitation, Detection method based on frequency of access, Detection method based on Hidden semi-Markov model, Detection method based on the analysis of user behavior data mining.The literature proposes a new Dos detection based on data mining, which combined Apriori algorithm and k-mean clustering algorithm. It usingnetwork data to detect DDoS, so it cannot cope with the application layer DDos. The k-mean algorithm have itself flawed, it overly need to select the fit cluster centers and for some initial value, it may converge to sub-optimal solution. Application layer DDoS detection based on PSO clustering algorithm Principle and model of detection: This paper establish detection model which is using to identify the application layer DDoS form analysis user behavior. System design as shown in Figure1. Figure1.system module design 3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) © 2016. The authors Published by Atlantis Press 670 Description of user browsing behavior The Web log records information about each user access to the server, it including the user’s IP address, client, customer identification, time of Web server receives the request, customer requests, request status code, transmitted bytes such as some access data. Extract Web log , preprocess the information and translate the results into Session: 1 1 2 2 { , ,u , ,u , , ,u } k k i i S ip t t t         (1) Calculate the distance between sessions In order to more accurately describe the user browsing behavior, better reflects the normal legitimate users and anomaly attacks users browse access to the difference in behavior, so analysis the similarities and differences in content, time, page-views and sequence. This paper refer to the method which use three vectors and a matrix to detailed descript the user’s session features.Then calculate the similarity between session  , the more similarity the distance more small. So the abstract distance can be defined as 1 d=  . Definition 1 (content vector): 1 2 (w , w , , w ) k n W   , length of the vector is n. It indicates the server contains page number. The formula is as follows: [1,n] (W , W ) (W , W ) i i p q i p q n       () () (2) Definition 2 (time vector): 1 2 (t , t , , t ) k n T   1, length of the vector is n. It of user browsing page i.The similarity formula of two hit vectors is as follows: (T ,T ) 1 d(T ,T ) p q p q    (3) Definition 3 (hit vector): 1 2 (hit , hit , , hit ) k n Hit   ,length of the vector is n.It indicates times number of a user brows a page, it reflects the user’s interest degree each pages. (Hit ,Hit ) 1 d(Hit ,Hit ) p q p q    (4) Definition 4 (sequence matrix): k H is a n n  matrix, it records the number of times of jumping between the various pages in the session. The similarity formula of two time vectors is as follows: (i, j) (i, j) (1,n) (1,n) 2 (H ,H ) (H ,H ) p q i j p q n          (5) Considering the similarity between three vector and a matrix, the overall similarity (S ,S ) p q  , is as follows: (W , W ) (T ,T ) (Hit , Hit ) (H , H ) (S ,S ) 4 p q p q p q p q p q          (6) Numerically greater, the session are more similar, the distance between there sessions is smaller. So the distance is as follows: The formula is as follow 1 d(S ,S ) (S ,S ) p q p q   (7) Detection of attacks The Sessions is defined as , {S i 1,2 , N} i S    , , Si is a N-dimensional pattern vector. The solution is to divide 1 2 { , , , } M       1, let the total dispersion of the all clusters to be minimum.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []