ConTrack: A Scalable Method for Tracking Multiple Concepts in Large Scale Multidimensional Data
2016
In industrial domains such as finance, telecommunications, the internet, and sensor monitoring, large volumes of unlabeled temporal data are continuously generated, such as financial transactions, sensor measurements and user activities. From a data analysis standpoint, there is significant utility to be gained by detecting and understanding changes in the data, such as physical activity recognition and content consumption behavior, or anomalies and faults in robots and sensors. However, because the data is unlabeled, it is challenging to visualize and understand in a way that produces interpretable insights, furthermore, the large volume of data imposes a scalability requirement. In the concept drift and stream mining literature, existing methods may focus on one or two, but rarely all three, of the aforementioned aspects: unlabeled data, interpretable output, scalability. Addressing this need, we propose ConTrack, an unsupervised method that tracks multiple evolving concepts in temporal data, and which is parallelized over a cluster of machines. To enhance interpretability, our method structures its output at a per-user (or actor) level, where users subscribe to one or more evolving concepts. Our method applies to problem settings (multiple concepts, unsupervised data, temporal data, user-oriented data) that cannot be handled by existing concept drift and stream mining methods, and outperforms popular unsupervised baselines from the wider Data Mining and Machine Learning literature.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI