D2U: Data Driven User Emulation for the Enhancement of Cyber Testing, Training, and Data Set Generation

2021 
Whether testing intrusion detection systems, conducting training exercises, or creating data sets to be used by the broader cybersecurity community, realistic user behavior is a critical component of a cyber range. Existing methods either rely on network level data or replay recorded user actions to approximate real users in a network. Our work produces generative models trained on actual user data (sequences of application usage) collected from endpoints. Once trained to the user’s behavioral data, these models can generate novel sequences of actions from the same distribution as the training data. These sequences of actions are then fed to our custom software via configuration files, which replicate those behaviors on end devices. Notably, our models are platform agnostic and could generate behavior data for any emulation software package. In this paper we present our model generation process, software architecture, and an investigation of the fidelity of our models. Specifically, we consider two different representations of the behavioral sequences, on which three standard generative models for sequential data—Markov Chain, Hidden Markov Model, and Random Surfer—are employed. Additionally, we examine adding a latent variable to faithfully capture time-of-day trends. Best results are observed when sampling a unique next behavior (regardless of the specific sequential model used) and the duration to take the behavior, paired with the temporal latent variable. Our software is currently deployed in a cyber range to help evaluate the efficacy of defensive cyber technologies, and we suggest additional ways that the cyber community as a whole can benefit from more realistic user behavior emulation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []