Detecting Unknown DGAs Using Distances Between Feature Vectors of Domain Names

2021 
Many botnets adopt domain generation algorithms (DGAs) to set up stealthy Command & Control (C2) communication. A DGA generates a great number of domain names and the attacker selects some of them to map to the C2 servers. In this paper, we propose Talos, a DGA detection approach to detect unknown DGAs and also known DGAs accurately. The key insight of Talos is that domain names can be represented by feature vectors satisfying the condition that distances between the feature vectors can reflect whether they are of the same class. Talos uses a neural language model to extract the feature vector of a domain name. After that, Talos determines if the feature vector belongs to a class based on whether it is within the boundary of the class and near the centroid of the class. We evaluate the detection ability of Talos on both unknown and known DGAs. Our experimental results show that Talos achieves recall over 92% on unknown classes and F1-score over 95% on known classes. We also compare Talos with state-of-the-art detection approaches and find that Talos's ability to detect unknown DGAs largely surpasses them.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []