Hybrid-Order and Multi-Stream Convolutional Neural Network for Fine-Grained Visual Recognition

2019 
Fine-grained visual recognition is challenging since the inter-category differences are more subtle compared with the conspicuous intra-category variations. Bilinear pooling that captures the second-order statistics of convolutional features has proved to be effective in such tasks. Since common bilinear methods neglect the original shallow feature information extracted from basic convolutional neural networks, we introduce a novel weakly supervised Hybrid-order and Multi-stream Convolutional Neural Network (HM-CNN)to address this problem. The model applies multi-scale fusion to integrate feature maps at all scales, and then employs hybrid-order pooling to combine the first-order statistics with the second-order bilinear features across spatial locations. Additionally, a cross multi-stream framework built on three basic networks is utilized to enhance the robustness of our model. Results demonstrate that the HM-CNN significantly improves the accuracy by 1-3% than the state-of-the-art models on three popular fine-grained recognition datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []