Enabling reproducible cyber research - four labeled datasets

2016 
In this paper, we describe the design and creation of four publicly available datasets generated using a testbed with simulated benign users and a manual attacker. The datasets were created to provide examples of cyber exploitations and aid in the production of reproducible research that address cyber security challenges. The CyberVAN testbed provides sophisticated capabilities for high-fidelity cyber experimentation in strategic and tactical network environments. The representative network is sufficiently complex with synthetic users performing normal duties that generate traffic (webpage browsing, e-mail, etc.). Both network and host based facts/logs are included in the dataset along with a diagram of the network and a timeline of events. The four datasets encompass progressively complex scenarios: 1) malware infection injection via a phishing email attachment; 2) propagating botnet injection via phishing email attachment with a Single Fast Flux algorithm for bot master identification/communication; 3) propagating botnet injection via email link using a Domain Generation Algorithm for bot master identification/communication; 4) propagating botnet injection via corruption of a legitimate internal web server with Double Fast Flux for bot master identification/communication. The full datasets along with relevant documentation is available for public download. Additional datasets containing tactical network scenarios and environments will be added to the repository in the future with the goal of enabling reproducible cyber security research that will advance the science of cyber security.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    10
    Citations
    NaN
    KQI
    []