Incorporating word correlation into tag-topic model for semantic knowledge acquisition
2012
This paper presents a tag-topic model with Dirichlet Forest prior (TTM-DF) for semantic knowledge acquisition from blog. The TTM-DF model extends the tag-topic model (TTM) by replacing the Dirichlet prior with the Dirichlet Forest prior over the topic-word multinomial. The correlation between words are calculated to generate a set of Must-Links and Cannot-Links, then the structures of Dirichlet trees are obtained though encoding the constraints of Must-Links and Cannot-Links. Words under the same subtrees are expected to be more correlated than words under different subtrees. We conduct experiments on a synthetic and a blog dataset. Both of the experimental results show that the TTM-DF model performs much better than the TTM model. It can improve the coherence of the underlying topics and the tag-topic distributions, and capture semantic knowledge effectively.
Keywords:
- Semantic memory
- Information retrieval
- Data mining
- Dirichlet distribution
- Hierarchical Dirichlet process
- Latent Dirichlet allocation
- Encoding (memory)
- Machine learning
- Topic model
- Correlation
- Multinomial distribution
- Computer science
- Pattern recognition
- Artificial intelligence
- Coherence (physics)
- Natural language processing
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
17
References
6
Citations
NaN
KQI