Unsupervised Video Summarization with Attentive Conditional Generative Adversarial Networks
76
Citation
43
Reference
10
Related Paper
Citation Trend
Abstract:
With the rapid growth of video data, video summarization technique plays a key role in reducing people's efforts to explore the content of videos by generating concise but informative summaries. Though supervised video summarization approaches have been well studied and achieved state-of-the-art performance, unsupervised methods are still highly demanded due to the intrinsic difficulty of obtaining high-quality annotations. In this paper, we propose a novel yet simple unsupervised video summarization method with attentive conditional Generative Adversarial Networks (GANs). Firstly, we build our framework upon Generative Adversarial Networks in an unsupervised manner. Specifically, the generator produces high-level weighted frame features and predicts frame-level importance scores, while the discriminator tries to distinguish between weighted frame features and raw frame features. Furthermore, we utilize a conditional feature selector to guide GAN model to focus on more important temporal regions of the whole video frames. Secondly, we are the first to introduce the frame-level multi-head self-attention for video summarization, which learns long-range temporal dependencies along the whole video sequence and overcomes the local constraints of recurrent units, e.g., LSTMs. Extensive evaluations on two datasets, SumMe and TVSum, show that our proposed framework surpasses state-of-the-art unsupervised methods by a large margin, and even outperforms most of the supervised methods. Additionally, we also conduct the ablation study to unveil the influence of each component and parameter settings in our framework.Keywords:
Discriminator
Margin (machine learning)
Feature (linguistics)
Key frame
Feature Learning
Generative model
Identifying anomalous samples from highly complex and unstructured data is a crucial but challenging task in a variety of intelligent systems. In this paper, we present a novel deep anomaly detection framework named AnoDM (standing for Anomaly detection based on unsupervised Disentangled representation learning and Manifold learning). The disentanglement learning is currently implemented by β-VAE for automatically discovering interpretable factorized latent representations in a completely unsupervised manner. The manifold learning is realized by t-SNE for projecting the latent representations to a 2D map. We define a new anomaly score function by combining β-VAE's reconstruction error in the raw feature space and local density estimation in the t-SNE space. AnoDM was evaluated on both image and time-series data and achieved better results than models that use just one of the two measures and other deep learning methods.
Feature Learning
Representation
Anomaly (physics)
Cite
Citations (6)
Unsupervised learning methods have recently shown their competitiveness against supervised training. Typically, these methods use a single objective to train the en-tire network. But one distinct advantage of unsupervised over supervised learning is that the former possesses more variety and freedom in designing the objective. In this work, we explore new dimensions of unsupervised learning by proposing the Progressive Stage-wise Learning (PSL) framework. For a given unsupervised task, we design multi-level tasks and define different learning stages for the deep network. Early learning stages are forced to focus on low-level tasks while late stages are guided to extract deeper information through harder tasks. We discover that by progressive stage-wise learning, unsupervised feature representation can be effectively enhanced. Our extensive experiments show that PSL consistently improves results for the leading unsupervised learning methods.
Feature Learning
Feature (linguistics)
Competitive learning
Representation
Supervised Learning
Cite
Citations (6)
Introduction: Representations play an essential role in learning artificial and biological systems by producing informative structures associated with characteristic patterns in the sensory environment. In this work, we examined unsupervised latent representations of images of basic geometric shapes with neural network models of unsupervised generative self-learning. Background: Unsupervised concept learning with generative neural network models. Objective: Investigation of structure, geometry and topology in the latent representations of generative models that emerge as a result of unsupervised self-learning with minimization of generative error. Examine the capacity of generative models to abstract and generalize essential data characteristics, including the type of shape, size, contrast, position and orientation. Methods: Generative neural network models, direct visualization, density clustering, and probing and scanning of latent positions and regions. Results: Structural consistency of latent representations; geometrical and topological characteristics of latent representations examined and analysed with unsupervised methods. Development and verification of methods of unsupervised analysis of latent representations. Conclusion: Generative models can be instrumental in producing informative compact representations of complex sensory data correlated with characteristic patterns.
Generative model
Cite
Citations (0)
Unsupervised representation learning aims at describing raw data efficiently to solve various downstream tasks. It has been approached with many techniques, such as manifold learning, diffusion maps, or more recently self-supervised learning. Those techniques are arguably all based on the underlying assumption that target functions, associated with future downstream tasks, have low variations in densely populated regions of the input space. Unveiling minimal variations as a guiding principle behind unsupervised representation learning paves the way to better practical guidelines for self-supervised learning algorithms.
Representation
Feature Learning
Competitive learning
Supervised Learning
Cite
Citations (0)
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
Feature Learning
Representation
Feature (linguistics)
Instance-based learning
Cite
Citations (444)
Feature Learning
Feature (linguistics)
Representation
Feature vector
Transfer of learning
Cite
Citations (4)
Unsupervised learning methods have recently shown their competitiveness against supervised training. Typically, these methods use a single objective to train the entire network. But one distinct advantage of unsupervised over supervised learning is that the former possesses more variety and freedom in designing the objective. In this work, we explore new dimensions of unsupervised learning by proposing the Progressive Stage-wise Learning (PSL) framework. For a given unsupervised task, we design multilevel tasks and define different learning stages for the deep network. Early learning stages are forced to focus on lowlevel tasks while late stages are guided to extract deeper information through harder tasks. We discover that by progressive stage-wise learning, unsupervised feature representation can be effectively enhanced. Our extensive experiments show that PSL consistently improves results for the leading unsupervised learning methods.
Feature Learning
Feature (linguistics)
Competitive learning
Representation
Cite
Citations (0)
Unsupervised learning methods have recently shown their competitiveness against supervised training. Typically, these methods use a single objective to train the entire network. But one distinct advantage of unsupervised over supervised learning is that the former possesses more variety and freedom in designing the objective. In this work, we explore new dimensions of unsupervised learning by proposing the Progressive Stage-wise Learning (PSL) framework. For a given unsupervised task, we design multilevel tasks and define different learning stages for the deep network. Early learning stages are forced to focus on lowlevel tasks while late stages are guided to extract deeper information through harder tasks. We discover that by progressive stage-wise learning, unsupervised feature representation can be effectively enhanced. Our extensive experiments show that PSL consistently improves results for the leading unsupervised learning methods.
Feature Learning
Competitive learning
Feature (linguistics)
Representation
Cite
Citations (1)
From the intuitive notion of disentanglement, the image variations corresponding to different factors should be distinct from each other, and the disentangled representation should reflect those variations with separate dimensions. To discover the factors and learn disentangled representation, previous methods typically leverage an extra regularization term when learning to generate realistic images. However, the term usually results in a trade-off between disentanglement and generation quality. For the generative models pretrained without any disentanglement term, the generated images show semantically meaningful variations when traversing along different directions in the latent space. Based on this observation, we argue that it is possible to mitigate the trade-off by $(i)$ leveraging the pretrained generative models with high generation quality, $(ii)$ focusing on discovering the traversal directions as factors for disentangled representation learning. To achieve this, we propose Disentaglement via Contrast (DisCo) as a framework to model the variations based on the target disentangled representations, and contrast the variations to jointly discover disentangled directions and learn disentangled representations. DisCo achieves the state-of-the-art disentangled representation learning and distinct direction discovering, given pretrained non-disentangled generative models including GAN, VAE, and Flow. Source code is at https://github.com/xrenaa/DisCo.
Generative model
Feature Learning
Representation
Leverage (statistics)
Regularization
Code (set theory)
Cite
Citations (11)
Unsupervised learning is a good neural network training way. However, the unsupervised learning algorithm is rare. The generative model is an interesting algorithm which can generate the similar data as the sample data by building a probabilistic model of the input data, and it can be used for unsupervised learning. Variational autoencoder is a typical generative model which is different from common autoencoder that a probabilistic parameter layer follows the hidden layer. Some new data can be reconstructed according to probabilistic model parameters. The probabilistic model parameter is the latent variable. In this paper, we want to do some research to test the data reconstruct effect of the variational autoencoder by different latent variables. According to the simulation, the more latent variables the more style of the sample is.
Autoencoder
Generative model
Cite
Citations (6)