Segmentation and detection at IBM: hybrid statistical models and two-tiered clustering
2002
IBM's story segmentation uses a combination of decision tree and maximum entropy models. They take a variety of lexical, prosodic, semantic, and structural features as their inputs. Both types of models are source-specific, and we substantially lower Cseg by combining them. IBM's topic detection system introduces a minimal hierarchy into the clustering: each cluster is comprised of one or more microclusters. We investigate the importance of merging microclusters together, and propose a merging strategy which improves our performance.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
5
References
24
Citations
NaN
KQI