logo
    Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach
    9
    Citation
    59
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Weakly-supervised temporal action localization aims to localize action instances in videos with only video-level action labels. Existing methods mainly embrace a localization-by-classification pipeline that optimizes the snippet-level prediction with a video classification loss. However, this formulation suffers from the discrepancy between classification and detection, resulting in inaccurate separation of foreground and background (F&B) snippets. To alleviate this problem, we propose to explore the underlying structure among the snippets by resorting to unsupervised snippet clustering, rather than heavily relying on the video classification loss. Specifically, we propose a novel clustering-based F&B separation algorithm. It comprises two core components: a snippet clustering component that groups the snippets into multiple latent clusters and a cluster classification component that further classifies the cluster as foreground or background. As there are no ground-truth labels to train these two components, we introduce a unified self-labeling mechanism based on optimal transport to produce high-quality pseudo-labels that match several plausible prior distributions. This ensures that the cluster assignments of the snippets can be accurately associated with their F&B labels, thereby boosting the F&B separation. We evaluate our method on three benchmarks: THUMOS14, ActivityNet v1.2 and v1.3. Our method achieves promising performance on all three benchmarks while being significantly more lightweight than previous methods. Code is available at https://github.com/Qinying-Liu/CASE
    Keywords:
    Snippet
    Boosting
    Code (set theory)
    Component (thermodynamics)
    This paper investigates the effect of a snippet on users' relevance judgment of a document. Web search engines are becoming very useful tools in our daily life. One of the most important features of modern search engines is a snippet, which is expected to help users to find relevant pages immediately. Therefore, improving the quality of a snippet is important for minimizing the cost of user feedback required for finding relevant pages. This paper investigates the effect of a snippet on users' relevance judgment of a document by comparing users' relevance judgment when a snippet is provided and those without providing a snippet. The experimental results show providing snippets reduces users' judgment time, while keeping judgment accuracy. It is also shown the effect of snippet length on judgment time is strong when judging relevant documents. The obtained results will contribute to the improvement of a snippet generation method in terms of minimal user feedback.
    Snippet
    Relevance
    Citations (0)
    In this paper, we address video-based person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. Our approach divides long person sequences into multiple short video snippets and aggregates the top-ranked snippet similarities for sequence-similarity estimation. With this strategy, the intra-person visual variation of each sample could be minimized for similarity estimation, while the diverse appearance and temporal information are maintained. The snippet similarities are estimated by a deep neural network with a novel temporal co-attention for snippet embedding. The attention weights are obtained based on a query feature, which is learned from the whole probe snippet by an LSTM network, making the resulting embeddings less affected by noisy frames. The gallery snippet shares the same query feature with the probe snippet. Thus the embedding of gallery snippet can present more relevant features to compare with the probe snippet, yielding more accurate snippet similarity. Extensive ablation studies verify the effectiveness of competitive snippet-similarity aggregation as well as the temporal co-attentive embedding. Our method significantly outperforms the current state-of-the-art approaches on multiple datasets.
    Snippet
    Similarity (geometry)
    Feature (linguistics)
    Identification
    Citations (218)
    Click-through rate (CTR) is a key signal of relevance for search engine results, both organic and sponsored. CTR of a result has two core components: (a) the probability of examination of a result by a user, and (b) the perceived relevance of the result given that it has been examined by the user. There has been considerable work on user browsing models, to model and analyze both the examination and the relevance components of CTR. In this paper, we propose a novel formulation: a micro-browsing model for how users read result snippets. The snippet text of a result often plays a critical role in the perceived relevance of the result. We study how particular words within a line of snippet can influence user behavior. We validate this new micro-browsing user model by considering the problem of predicting which snippet will yield higher CTR, and show that classification accuracy is dramatically higher with our micro-browsing user model. The key insight in this paper is that varying relatively few words within a snippet, and even their location within a snippet, can have a significant influence on the clickthrough of a snippet.
    Snippet
    Relevance
    Citations (1)
    Click-through rate (CTR) is a key signal of relevance for search engine results, both organic and sponsored. CTR of a result has two core components: (a) the probability of examination of a result by a user, and (b) the perceived relevance of the result given that it has been examined by the user. There has been considerable work on user browsing models, to model and analyze both the examination and the relevance components of CTR. In this paper, we propose a novel formulation: a micro-browsing model for how users read result snippets. The snippet text of a result often plays a critical role in the perceived relevance of the result. We study how particular words within a line of snippet can influence user behavior. We validate this new micro-browsing user model by considering the problem of predicting which snippet will yield higher CTR, and show that classification accuracy is dramatically higher with our micro-browsing user model. The key insight in this paper is that varying relatively few words within a snippet, and even their location within a snippet, can have a significant influence on the clickthrough of a snippet.
    Snippet
    Relevance
    Citations (0)
    최근 많은 미디어 플랫폼의 발달로 비정형 비디오들의 수집과 접근이 용이해졌다. 이에 따라 비디오 이해를 위해 비정형 비디오에서 행동의 시작과 끝을 찾는 시간적 행동 검출 연구가 최근 활발히 이루어지고다. 시간적 행동 구간 생성 방법은 Temporalonvolutional Network를 이용하여 행동 구간을 정의한다. 이와는 다르게, 본 논문에서는 행동을 시간적 발생 순서에 따라 모델링 하기 위하여 LSTM을 이용한 방법을 제안한다. 제안하는 방법은 LSTM을 이용하여 단편 관련성 (Snippet Relatedness)를 평가하고 이를 통해 행동 구간을 정의한다. 단편 관련성은 단편들이 서로 동일한 행동 구간에 포함되는지를 나타내는 지표이다. 제안하는 방법은 THUMOS-14 데이터 셋에 대한 실험에서 50개의 행동 구간 수 추출 시 41.34% 평균 리콜 성능을 얻어 BSN, MGG 보다 3.88%, 1.41% 우수한 성능을, SRG보다는 0.85% 떨어진 성능을 보였다.
    Snippet
    The recent success of Transformer has provided a new direction to various visual understanding tasks, including video-based facial expression recognition (FER). By modeling visual relations effectively, Transformer has shown its power for describing complicated patterns. However, Transformer still performs unsatisfactorily to notice subtle facial expression movements, because the expression movements of many videos can be too small to extract meaningful spatial-temporal relations and achieve robust performance. To this end, we propose to decompose each video into a series of expression snippets, each of which contains a small number of facial movements, and attempt to augment the Transformer's ability for modeling intra-snippet and inter-snippet visual relations, respectively, obtaining the Expression snippet Transformer (EST). In particular, for intra-snippet modeling, we devise an attention-augmented snippet feature extractor (AA-SFE) to enhance the encoding of subtle facial movements of each snippet by gradually attending to more salient information. In addition, for inter-snippet modeling, we introduce a shuffled snippet order prediction (SSOP) head and a corresponding loss to improve the modeling of subtle motion changes across subsequent snippets by training the Transformer to identify shuffled snippet orders. Extensive experiments on four challenging datasets (i.e., BU-3DFE, MMI, AFEW, and DFEW) demonstrate that our EST is superior to other CNN-based methods, obtaining state-of-the-art performance.
    Snippet
    Citations (0)
    웹 검색에서 사용자가 원하는 결과를 제시하는 일은 중요한 문제로 다루어져 왔다. 검색 결과를 효과적이고 다양하게 보이도록 여러 가지 요소들이 검색 결과에 포함된다. 이렇게 포함되는 결과 요소 중 웹 페이지의 간략한 대표 글인 Snippet은 사용자의 웹 페이지 방문 여부에 중요한 영향을 미치는 것으로 알려져 있다. Snippet은 사용자의 의도가 잘 반영될 뿐 아니라 웹 페이지의 내용도 가장 잘 대표할 수 있는 문장들로 구성되어야 한다. 기존의 Snippet 생성 방법에는 주로 검색 질의어의 빈도나 제목과의 유사도 만을 고려하여 Snippet을 추출하였으나 이는 제목이 가지는 모호성이나 검색 질의어에만 의존한다는 한계를 가지고 있었다. 본 논문에서는 웹 페이지의 내용을 가장 잘 나타내는 문장을 생성하기 위해서 페이지 단어들을 축으로 하여 주성분 분석(PCA)을 하였다. 주성분 분석(PCA)을 통해서 페이지에 가장 영향을 많이 주는 축 성분을 찾아 이를 Snippet 생성에 활용하였고 이러한 접근이 효과적임을 입증하였다.
    Snippet
    Citations (0)
    Click-through rate (CTR) is a key signal of relevance for search engine results, both organic and sponsored. CTR of a result has two core components: (a) the probability of examination of a result by a user, and (b) the perceived relevance of the result given that it has been examined by the user. There has been considerable work on user browsing models, to model and analyze both the examination and the relevance components of CTR. In this paper, we propose a novel formulation: a micro-browsing model for how users read result snippets. The snippet text of a result often plays a critical role in the perceived relevance of the result. We study how particular words within a line of snippet can influence user behavior. We validate this new micro-browsing user model by considering the problem of predicting which snippet will yield higher CTR, and show that classification accuracy is dramatically higher with our micro-browsing user model. The key insight in this paper is that varying relatively few words within a snippet, and even their location within a snippet, can have a significant influence on the clickthrough of a snippet.
    Snippet
    Relevance
    Citations (1)