Recognizing Landmarks in Large-Scale Social Image Collections

2016 
The dramatic growth of social media websites over the last few years has created huge collections of online images and raised new challenges in organizing them effectively. One particularly intuitive way of browsing and searching images is by the geo-spatial location of where on Earth they were taken, but most online images do not have GPS metadata associated with them. We consider the problem of recognizing popular landmarks in large-scale datasets of unconstrained consumer images by formulating a classification problem involving nearly 2 million images and 500 categories. The dataset and categories are formed automatically from geo-tagged photos from Flickr by looking for peaks in the spatial geo-tag distribution corresponding to frequently photographed landmarks. We learn models for these landmarks with a multiclass support vector machine, using classic vector-quantized interest point descriptors as features. We also incorporate the nonvisual metadata available on modern photo-sharing sites, showing that textual tags and temporal constraints lead to significant improvements in classification rate. Finally, we apply recent breakthroughs in deep learning with Convolutional Neural Networks, finding that these models can dramatically outperform the traditional recognition approaches to this problem, and even beat human observers in some cases. (This is an expanded and updated version of an earlier conference paper [23]).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    8
    Citations
    NaN
    KQI
    []