Schema Independent Reduction of Streaming Log Data

2014 
Large software systems comprise of different and tightly interconnected components. Such systems utilize heterogeneous monitoring infrastructures which produce log data at high rates from various sources and in diverse formats. The sheer volume of this data makes almost impossible the real- or near real-time processing of these system logs. In this paper, we present a log schema independent approach that allows for the real time reduction of logged data based on a set of filtering criteria. The approach utilizes a similarity measure between features of the incoming events and a set of filtering features we refer to as beacons. The similarity measure is based on information theory principles and uses caching techniques so that infinite log data streams and log data schema alterations can be handled. The approach has been applied successfully on the KDD-99 intrusion detection benchmark data set.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    4
    Citations
    NaN
    KQI
    []