Topic Analysis of Web User Behavior Using LDA Model on Proxy Logs

2011 
We propose a web user profiling and clustering framework based on LDA-based topic modeling with an analogy to document analysis in which documents and words represent users and their actions. The main technical challenge addressed here is how to symbolize web access actions, by words, that are monitored through a web proxy. We develop a hierarchical URL dictionary generated from Yahoo! Directory and a cross-hierarchical matching method that provides the function of automatic abstraction. We apply the proposed framework to 7500 students in Osaka University. The results include, for example, 24 topics such as ”Technology Oriented”, ”Job Hunting”, and ”SNS-addict.” The results reflect the typical interest profiles of University students, while perplexity analysis is employed to confirm the optimality of the framework.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []