Memory bounded inference in topic models

Ryan G. Gomes,Max Welling,Pietro Perona

Memory bounded inference in topic models

2008

Ryan G. Gomes
Max Welling
Pietro Perona

What type of algorithms and statistical techniques support learning from very large datasets over long stretches of time? We address this question through a memory bounded version of a variational EM algorithm that approximates inference in a topic model. The algorithm alternates two phases: "model building" and "model compression" in order to always satisfy a given memory constraint. The model building phase expands its internal representation (the number of topics) as more data arrives through Bayesian model selection. Compression is achieved by merging data-items in clumps and only caching their sufficient statistics. Empirically, the resulting algorithm is able to handle datasets that are orders of magnitude larger than the standard batch version.

Keywords:

Sufficient statistic
Machine learning
Expectation–maximization algorithm
Satisfiability
Artificial intelligence
Inference
Topic model
Bounded function
Model building
Mathematics
Bayesian inference
Merge (version control)
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations