Feature Extraction for Document Image Segmentation by pLSA Model

Takuma Yamaguchi,Minoru Maruyama

Feature Extraction for Document Image Segmentation by pLSA Model

2008

Takuma Yamaguchi
Minoru Maruyama

In this paper, we propose a method for document image segmentation based on pLSA (probabilistic latent semantic analysis) model. The pLSA model is originally developed for topic discovery in text analysis using "bag-of-words" document representation. The model is useful for image analysis by "bag-of-visual words" image representation. The performance of the method depends on the visual vocabulary generated by feature extraction from the document image. We compare several feature extraction and description methods, and examine the relations to segmentation performance. Through the experiments, we show accurate content-based document segmentation is made possible by using pLSA-based method.

Keywords:

Bag-of-words model
Image segmentation
Scale-space segmentation
Topic model
Bag-of-words model in computer vision
Feature extraction
Text mining
Artificial intelligence
Pattern recognition
Computer science
Probabilistic latent semantic analysis
Support vector machine
Segmentation
Visualization
Natural language processing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations