Segmentation and detection at IBM: hybrid statistical models and two-tiered clustering

Satya Dharanipragada,Martin Franz,J. S. McCarley,Todd Ward,Wei-Jing Zhu

Segmentation and detection at IBM: hybrid statistical models and two-tiered clustering

2002

Satya Dharanipragada
Martin Franz
J. S. McCarley
Todd Ward
Wei-Jing Zhu

IBM's story segmentation uses a combination of decision tree and maximum entropy models. They take a variety of lexical, prosodic, semantic, and structural features as their inputs. Both types of models are source-specific, and we substantially lower Cseg by combining them. IBM's topic detection system introduces a minimal hierarchy into the clustering: each cluster is comprised of one or more microclusters. We investigate the importance of merging microclusters together, and propose a merging strategy which improves our performance.

Keywords:

Principle of maximum entropy
Cluster analysis
Decision tree
Hierarchy
IBM
Statistical model
Machine learning
Segmentation
Merge (version control)
Pattern recognition
Artificial intelligence
Computer science
Decision tree model

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations