Activity Recognition Applications from Contextual Video-Text Fusion

2015 
In this paper, we propose a demonstration of our capabilities in fusing information extracted from correlated video and text documents. We generate a probabilistic association between entities mentioned in text and detected in video data by jointly optimizing the measure of appearance and behavior similarity. We manage uncertainty that arises from non-overlapping (conflicting) features in the sources by maintaining multiple hypotheses. In work on synthetic data that have few overlapping features between sources, we have shown that our method of soft fusion has increased activity recognition scores over both single source processing and non-probabilistic (hard) fusion. When sources have over 60% overlapping features, hard fusion outperforms single source and soft fusion. Our approach is flexible to determine whether soft or hard fusion is appropriate for a dataset and selects the correct fusion algorithm to yield the highest activity recognition results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    2
    References
    0
    Citations
    NaN
    KQI
    []