Scalable cyber-security analytics with a new summary-based approximate query engine

2017 
A growing need for scalable solutions for both machine learning and interactive analytics exists in the area of cyber-security. Machine learning aims at segmentation and classification of log events, which leads towards optimization of the threat monitoring processes. The tools for interactive analytics are required to resolve the uncertain cases, whereby machine learning algorithms are not able to provide a convincing outcome and human expertise is necessary. In this paper we focus on a case study of a security operations platform, whereby typical layers of information processing are integrated with a new database engine dedicated to approximate analytics. The engine makes it possible for the security experts to query massive log event data sets in a standard relational style. The query outputs are received orders of magnitude faster than any of the existing database solutions running with comparable resources and, in addition, they are sufficiently accurate to make the right decisions about suspicious corner cases. The engine internals are driven by the principles of information granulation and summary-based processing. They also refer to the ideas of data quantization, approximate computing, rough sets and probability propagation. In the paper we study how the engine's parameters can influence its performance within the considered environment. In addition to the results of experiments conducted on large data sets, we also discuss some of our high level design decisions including the choice of an approximate query result accuracy measure that should reflect the specifics of the considered threat monitoring operations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    14
    Citations
    NaN
    KQI
    []