Feature Extraction for Document Image Segmentation by pLSA Model
2008
In this paper, we propose a method for document image segmentation based on pLSA (probabilistic latent semantic analysis) model. The pLSA model is originally developed for topic discovery in text analysis using "bag-of-words" document representation. The model is useful for image analysis by "bag-of-visual words" image representation. The performance of the method depends on the visual vocabulary generated by feature extraction from the document image. We compare several feature extraction and description methods, and examine the relations to segmentation performance. Through the experiments, we show accurate content-based document segmentation is made possible by using pLSA-based method.
Keywords:
- Bag-of-words model
- Image segmentation
- Scale-space segmentation
- Topic model
- Bag-of-words model in computer vision
- Feature extraction
- Text mining
- Artificial intelligence
- Pattern recognition
- Computer science
- Probabilistic latent semantic analysis
- Support vector machine
- Segmentation
- Visualization
- Natural language processing
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
13
References
4
Citations
NaN
KQI