Big Data Series Analytics Using TARDIS and its Exploitation in Geospatial Applications

2020 
The massive amounts of data series data continuously generated and collected by applications require new indices to speed up data series similarity queries on which various data mining techniques rely. However, the state-of-the-art iSAX-based indexing techniques do not scale well due to the binary fanout that leads to a highly deep index tree and suffer from accuracy degradation due to the character-level cardinality that leads to poor maintenance of the proximity. To address this problem, we recently proposed TARDIS to supports indexing and querying billion-scale data series datasets. It introduces a new iSAX-T signatures to reduce the cardinality conversion cost and corresponding sigTree to construct a compact index structure to preserve better similarity. The framework consists of one centralized index and local distributed indices to efficiently re-partition and index dimensional datasets. Besides, effective query strategies based on sigTree structure are proposed to greatly improve the accuracy. In this demonstration, we present GENET, a new interactive exploration demonstration that allows users to support Big Data Series Approximate Retrieval and Recursive Interactive Clustering in large-scale geospatial datasets using TARDIS index techniques.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    0
    Citations
    NaN
    KQI
    []