Online Clustering for Topic Detection in Social Data Streams

2016 
Microblogs have become an important origin of information regarding events happening in a location during a time period. Analyzing and clustering these streams of short textual messages is an important research activity which is attracting the interest of both public and private organizations, since the extracted knowledge can be exploited to enhance the comprehension of people behavior and the onset of emergency situations. Clustering these streams requires efficient algorithms capable of analyzing this continuos deluge of data. The paper proposes an online algorithm that incrementally groups tweet streams into clusters. The approach summarizes the examined tweets into the cluster centroids generated so far. The assignment of a tweet to a centroid uses a similarity measure that takes into account both the cluster age and the terms occurring in the tweet. Experiments on messages posted by users in the Manhattan area show that the method is able to extract events effectively taking place in the examined period.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    7
    Citations
    NaN
    KQI
    []