Activity Detection from Email Meta-Data Clustering

2016 
Information workers in a large enterprise often deal with large volumes of e-mail traffic every day. In such a scenario, automatic detection of activities that they are involved in has many potential uses, and even presenting users with a summary of their current set of activities was found to be of value in itself. In this paper, we describe the problem of automatically detecting user activities from e-mails, while using only meta-data of e-mails, i.e., we do not process email contents. We present a novel two stage algorithm for automatic activity detection from users' e-mails: We first represent the e-mail dataset as a rectangular matrix using features such as other e-mails, people involved, and names of the documents attached in the e-mails. We next represent the emails in latent feature space using SVD, followed by further dimensionality reduction using t-Distributed Stochastic Neighbor embedding(t-SNE). We then cluster e-mails using density based clustering algorithm in t-SNE space. In the second stage we merge these clusters based on group properties and a community detection algorithm on the graph of clusters, to yield our set of automatically detected activities. We analyse public e-mail datasets and present benchmarks of our approach on real-life datasets collected from our target users, and also compare our algorithm with alternative approaches as well as those published in recent literature.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    0
    Citations
    NaN
    KQI
    []