Early versus Late Dimensionality Reduction of Bag-of-Words Feature Representation for Image Classification

Chih-Fong Tsai,Ya-Han Hu,Wei-Chao Lin,Ming-Chang Wang

Early versus Late Dimensionality Reduction of Bag-of-Words Feature Representation for Image Classification

2017

Extracting the bag-of-words (BoW) feature from images has been widely used for image classification. In general, some local keypoints are first of all detected from each image and the keypoint descriptor, such as scale-invariant feature transform (SIFT), is extracted. Then, the keypoint descriptors of a given image dataset are tokenized (or clustered) to generate a visual-word vocabulary (or codebook). Next, the visual-word vector of an image contains the presence or absence information of each visual word in the image, e.g. the number of keypoints in the corresponding cluster, i.e. visual word. Consequently, images are represented by a histogram over visual words. Since the dimensionalities of the SIFT keypoint descriptor and the final BoW feature for image classification are certainly high, this paper aims at examining the effect of performing dimensionality reduction (DR) for both different features on classification accuracy. In particular, early DR is used over the SIFT descriptor and late DR for the BoW feature. The experimental results based on Caltech 101 (2-D images) and ESB (3-D images) datasets show that reducing 50% dimensionality of the SIFT descriptor by PCA can allow the SVM classifier to perform similar to the one without DR. On the other hand, late DR only works for 2-D images, but the classification performance of SVM cannot be kept if over 25% dimensionality of the BoW feature is reduced.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations