Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm, such as a DNN. MTL is based on the assumption that the tasks under consideration are related; therefore it exploits shared knowledge for improving performance on each individual task. Tasks are generally considered to be homogeneous, i.e., to refer to the same type of problem. Moreover, MTL is usually based on ground truth annotations with full, or partial overlap across tasks. In this work, we deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems. We explore task-relatedness as a means for co-training, in a weakly-supervised way, tasks that contain little, or even non-overlapping annotations. Task-relatedness is introduced in MTL, either explicitly through prior expert knowledge, or through data-driven studies. We propose a novel distribution matching approach, in which knowledge exchange is enabled between tasks, via matching of their predictions' distributions. Based on this approach, we build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks. We develop case studies for: i) continuous affect estimation, action unit detection, basic emotion recognition; ii) attribute detection, face identification. We illustrate that co-training via task relatedness alleviates negative transfer. Since FaceBehaviorNet learns features that encapsulate all aspects of facial behavior, we conduct zero-/few-shot learning to perform tasks beyond the ones that it has been trained for, such as compound emotion recognition. By conducting a very large experimental study, utilizing 10 databases, we illustrate that our approach outperforms, by large margins, the state-of-the-art in all tasks and in all databases, even in these which have not been used in its training.
Construction and fitting of Statistical Deformable Models (SDM) is in the core of computer vision and image analysis discipline. It can be used to estimate the object's shape, pose, parts and landmarks using only static imagery captured from monocular cameras. One of the first and most popular families of SDMs is that of Active Appearance Models. AAM uses a generative parameterization of object appearance and shape. The fitting process of AAMs is usually conducted by solving a non-linear optimization problem. In this talk I will start with a brief introduction to AAMs and I will continue with describing supervised methods for AAM fitting. Subsequently, under this framework, I will motivate current techniques developed in my group that capitalize on the combined power of Deep Convolutional Neural Networks (DCNN) and Recurrent NN (RNNs) for optimal deformable object modeling and fitting. Finally, I will show how we can extract dense shape of objects by building and fitting 3D Morphable Models. Examples will be given in the publicly available toolbox of my group called Menpo (http://www.menpo.org/).
A novel procedure is presented in this paper, for training a deep convolutional and recurrent neural network, taking into account both the available training data set and some information extracted from similar networks trained with other relevant data sets. This information is included in an extended loss function used for the network training, so that the network can have an improved performance when applied to the other data sets, without forgetting the learned knowledge from the original data set. Facial expression and emotion recognition in-the-wild is the test bed application that is used to demonstrate the improved performance achieved using the proposed approach. In this framework, we provide an experimental study on categorical emotion recognition using datasets from a very recent related emotion recognition challenge.
The de facto algorithm for facial landmark estimation involves running a face detector with a subsequent deformable model fitting on the bounding box. This encompasses two basic problems: i) the detection and deformable fitting steps are performed independently, while the detector might not provide best-suited initialisation for the fitting step, ii) the face appearance varies hugely across different poses, which makes the deformable face fitting very challenging and thus distinct models have to be used (\eg, one for profile and one for frontal faces). In this work, we propose the first, to the best of our knowledge, joint multi-view convolutional network to handle large pose variations across faces in-the-wild, and elegantly bridge face detection and facial landmark localisation tasks. Existing joint face detection and landmark localisation methods focus only on a very small set of landmarks. By contrast, our method can detect and align a large number of landmarks for semi-frontal (68 landmarks) and profile (39 landmarks) faces. We evaluate our model on a plethora of datasets including standard static image datasets such as IBUG, 300W, COFW, and the latest Menpo Benchmark for both semi-frontal and profile faces. Significant improvement over state-of-the-art methods on deformable face tracking is witnessed on 300VW benchmark. We also demonstrate state-of-the-art results for face detection on FDDB and MALF datasets.
Abstract Objective . The patterns of brain activity associated with different brain processes can be used to identify different brain states and make behavioural predictions. However, the relevant features are not readily apparent and accessible. Our aim is to design a system for learning informative latent representations from multichannel recordings of ongoing EEG activity. Approach : We propose a novel differentiable decoding pipeline consisting of learnable filters and a pre-determined feature extraction module. Specifically, we introduce filters parameterized by generalized Gaussian functions that offer a smooth derivative for stable end-to-end model training and allow for learning interpretable features. For the feature module, we use signal magnitude and functional connectivity estimates. Main results. We demonstrate the utility of our model on a new EEG dataset of unprecedented size (i.e. 721 subjects), where we identify consistent trends of music perception and related individual differences. Furthermore, we train and apply our model in two additional datasets, specifically for emotion recognition on SEED and workload classification on simultaneous task EEG workload. The discovered features align well with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening. This agrees with the specialisation of the temporal lobes regarding music perception proposed in the literature. Significance . The proposed method offers strong interpretability of learned features while reaching similar levels of accuracy achieved by black box deep learning models. This improved trustworthiness may promote the use of deep learning models in real world applications. The model code is available at https://github.com/SMLudwig/EEGminer/ .
Developing powerful deformable face models requires massive, annotated face databases on which techniques can be trained, validated and tested. Manual annotation of each facial image in terms of landmarks requires a trained expert and the workload is usually enormous. Fatigue is one of the reasons that in some cases annotations are inaccurate. This is why, the majority of existing facial databases provide annotations for a relatively small subset of the training images. Furthermore, there is hardly any correspondence between the annotated land-marks across different databases. These problems make cross-database experiments almost infeasible. To overcome these difficulties, we propose a semi-automatic annotation methodology for annotating massive face datasets. This is the first attempt to create a tool suitable for annotating massive facial databases. We employed our tool for creating annotations for MultiPIE, XM2VTS, AR, and FRGC Ver. 2 databases. The annotations will be made publicly available from http://ibug.doc.ic.ac.uk/ resources/facial-point-annotations/. Finally, we present experiments which verify the accuracy of produced annotations.
The human face is the most well-researched object in computer vision, mainly because (1) it is a highly deformable object whose appearance changes dramatically under different poses, expressions, and, illuminations, etc., (2) the applications of face recognition are numerous and span several fields, (3) it is widely known that humans possess the ability to perform, extremely efficiently and accurately, facial analysis, especially identity recognition. Although a lot of research has been conducted in the past years, the problem of face recognition using images captured in uncontrolled environments including several illumination and/or pose variations still remains open. This is also attributed to the existence of outliers (such as partial occlusion, cosmetics, eyeglasses, etc.) or changes due to age. In this chapter, the authors provide an overview of the existing fully automatic face recognition technologies for uncontrolled scenarios. They present the existing databases and summarize the challenges that arise in such scenarios and conclude by presenting the opportunities that exist in the field.
Deep face recognition has achieved remarkable improvements due to the introduction of margin-based softmax loss, in which the prototype stored in the last linear layer represents the center of each class. In these methods, training samples are enforced to be close to positive prototypes and far apart from negative prototypes by a clear margin. However, we argue that prototype learning only employs sample-to-prototype comparisons without considering sample-to-sample comparisons during training and the low loss value gives us an illusion of perfect feature embedding, impeding the further exploration of SGD. To this end, we propose Variational Prototype Learning (VPL), which represents every class as a distribution instead of a point in the latent space. By identifying the slow feature drift phenomenon, we directly inject memorized features into prototypes to approximate variational prototype sampling. The proposed VPL can simulate sample-to-sample comparisons within the classification framework, encouraging the SGD solver to be more exploratory, while boosting performance. Moreover, VPL is conceptually simple, easy to implement, computationally efficient and memory saving. We present extensive experimental results on popular benchmarks, which demonstrate the superiority of the proposed VPL method over the state-of-the-art competitors.
The methods introduced so far regarding discriminant non-negative matrix factorization (DNMF) do not guarantee convergence to a stationary limit point. In order to remedy this limitation, a novel DNMF method is presented that uses projected gradients. The proposed algorithm employs some extra modifications that make the method more suitable for classification tasks. The usefulness of the proposed technique to frontal face verification and facial expression recognition problems is demonstrated.