Multimedia annotation through search and mining

2009 
Multimedia annotation represents an application of computer vision that presents the recognition of objects or ideas related to a multimedia document as a text label. Typically, annotation algorithms depend on complicated feature extraction and matching algorithms that attempt to learn individual annotation models. This work, however, reveals that it is possible to achieve effective annotation of large datasets without specific models by combining information from low-level visual features with annotation mining of the data. This technique is referred to as annotation by mining. The method is especially effective in the presence of aliased, redundant data, a characteristic feature of social media sites and content available on the web. By using this formulation, we are able to address the problem in a way that is highly scalable and fast regardless of dictionary size. The work places particular emphasis on learning using graph theory. Such an approach can lead to algorithms that effectively combine disparate feature metrics through examination of the stability and smoothness of a graph constructed in any metric space. Specifically, a concept of “graph smoothness” is formulated that reflects the distribution of different attributes in the graph. This smoothness measurement allows us to extract visual annotations and geographic place annotations, as well as find weighting parameters for disparate similarity modalities. Analysis validates the approach on two different sets of videos, one a collection of TRECVID news videos and another a set crawled from the online repository hosted by YouTube, and two different image databases crawled from the set of Flickr geotagged photos. The approach is proven to be successful at mining accurate annotations out of noisy transcripts and noisy tagged social media data while scaling to dictionary sizes of more than 430,000 words.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    54
    References
    0
    Citations
    NaN
    KQI
    []