Patterns of life in temporal data: indexing and hashing for fast and relevant data retrieval

2014 
As datasets with time-series records, such as computer logs or financial transactions, grow larger, indexing solutions are needed that can efficiently filter out irrelevant records while retrieving most of relevant ones. These methods must capture essential temporal properties present in the data, and provide a scalable way to generate the index and update it as the new records are presented. Current time-series analysis and indexing methods are insufficient, because the fixed features they rely on capture only limited periodicity in time-series data and become brittle when the time-series encode heterogeneous temporal behaviors and are noisy and incomplete. New indexing solutions must not only cluster the data, but also infer the meaningful characteristics and present them to the users to improve their understanding of the data. In this paper, we develop an indexing procedure based on typical latent behaviors within the time series. Our method (1) converts the data to a quantized format, (2) learns identifying behaviors generating the data, and (3) produces an index for the time series based on these behaviors. The method is found to outperform standard approaches to time series indexing in terms of recall and precision for varying degrees of data noise.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    1
    Citations
    NaN
    KQI
    []