Exploiting generic multi-level convolutional neural networks for scene understanding

2016 
In this paper, we introduce the application of generic multi-level Convolutional Neural Networks (CNN) approach into the scene understanding or image parsing task. Given an input image, first, a set of similar images from the training set are retrieved based on global-level CNN feature matching similarities. Then, the input test image and the similar images are overseg-mented into superpixels. Next, the class of each test image's superpixel is initialized by the majority vote of the fc-nearest-neighbor superpixels based on regional-level CNN features and hand-crafted features matching. The initial superpixel parsing is later combined with per-exemplar sliding windows to roughly form the pixel labels. Eventually, the final labels are further refined by the contextual smoothing. Extensive experiments on different challenging datasets demonstrate the potentials of the proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    5
    Citations
    NaN
    KQI
    []