Early versus Late Dimensionality Reduction of Bag-of-Words Feature Representation for Image Classification

2017 
Extracting the bag-of-words (BoW) feature from images has been widely used for image classification. In general, some local keypoints are first of all detected from each image and the keypoint descriptor, such as scale-invariant feature transform (SIFT), is extracted. Then, the keypoint descriptors of a given image dataset are tokenized (or clustered) to generate a visual-word vocabulary (or codebook). Next, the visual-word vector of an image contains the presence or absence information of each visual word in the image, e.g. the number of keypoints in the corresponding cluster, i.e. visual word. Consequently, images are represented by a histogram over visual words. Since the dimensionalities of the SIFT keypoint descriptor and the final BoW feature for image classification are certainly high, this paper aims at examining the effect of performing dimensionality reduction (DR) for both different features on classification accuracy. In particular, early DR is used over the SIFT descriptor and late DR for the BoW feature. The experimental results based on Caltech 101 (2-D images) and ESB (3-D images) datasets show that reducing 50% dimensionality of the SIFT descriptor by PCA can allow the SVM classifier to perform similar to the one without DR. On the other hand, late DR only works for 2-D images, but the classification performance of SVM cannot be kept if over 25% dimensionality of the BoW feature is reduced.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []