Leveraging 2D and 3D cues for fine-grained object classification

2016 
Objects in fine-grained categories always share a high degree of shape similarity, making both “localizing discriminative parts” and “learning appearance descriptors” extremely difficult. We propose a framework to leverage 2D+3D cues to handle above two challenges. Towards the goal of image alignment to localize discriminative parts, traditional methods rely on either manual part annotation or image segmentation. Instead, our framework leverages each image's 3D camera pose estimation to align images; Towards the goal of “learning appearance descriptors” confined with small training data and memory/computation cost, we propose an unsupervised Convolutional Sparse Coding (CSC) + manifold learning that significantly reduces model complexity, but still successfully produces highly diverse feature filters like deep neural network. Our experimental results attest the advocated framework's accuracy is comparable to a deep network, demonstrating its great potential on mobile devices.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    2
    Citations
    NaN
    KQI
    []