In this paper, we focus on the problem of clustering faces in videos. Different from traditional clustering on a collection of facial images, a video provides some inherent benefits: faces from a face track must belong to the same person and faces from a video frame can not be the same person. These benefits can be used to enhance the clustering performance. More precisely, we convert the above benefits into must-link and cannot-link constraints. These constraints are further effectively incorporated into our novel algorithm, Video Face Clustering via Constrained Sparse Representation (CS-VFC). The CS-VFC utilizes the constraints in two stages, including sparse representation and spectral clustering. Experiments on real-world videos show the improvements of our algorithm over the state-of-the-art methods.
This work presents a simple yet effective model for multi-view metric learning, which aims to improve the classification of data with multiple views, e.g., multiple modalities or multiple types of features. The intrinsic correlation, different views describing same set of instances, makes it possible and necessary to jointly learn multiple metrics of different views, accordingly, we propose a multi-view metric learning method based on Fisher discriminant analysis (FDA) and Hilbert-Schmidt Independence Criteria (HSIC), termed as Fisher-HSIC Multi-View Metric Learning (FISH-MML). In our approach, the class separability is enforced in the spirit of FDA within each single view, while the consistence among different views is enhanced based on HSIC. Accordingly, both intra-view class separability and inter-view correlation are well addressed in a unified framework. The learned metrics can improve multi-view classification, and experimental results on real-world datasets demonstrate the effectiveness of the proposed method.
Objective:To investigate the relationship between the expressions of proliferating cell nuclear antigen (PCNA) and the clinicobiological characteristics and prognosis in ovarian epithelial carcinomaMethods:84 specimens of ovarian epithelial carcinoma were detected for the expressions of PCNA by using labellingstreptavidinbiotin(LSAB) immunohistochemistry assay, and the correlations were analyzed for the PCNA expression with stage, histological grade, and histologic subtype of the tumor, and age and prgnosis ofthe patientResults:All tumors had PCNA expression, the positive rate was 100%There were significant correlations between the expressional grade of PCNA and the age (P 005 ), and histological grade ( P 001 )No significant correlation was found between the expressional grade of PCNA and stage or histologic subtype ( P 005 )In unvariate analysis, the expression of PCNA was associated with poor survival in ovarian epithelial carcinoma, but it did not retainany prognostic significance in multivariate analysisConclusions:PCNA expressions in ovarian epithelial carcinoma is high The tumor with lower histological grade, or in the patients of older age, will express higher level of PCNAThe expression of PCNA is not an independent prognostic factor for prognosis in epithelial ovarian carcinomaStage and histological grade are still the significant prognostic factors in the epithelial ovarian cancer
The manifold of symmetric positive definite (SPD) matrices has drawn significant attention because of its widespread applications. SPD matrices provide compact nonlinear representations of data and form a special type of Riemannian manifold. The direct application of support vector machines on SPD manifold maybe fails due to lack of samples per class. In this paper, we propose a support vector metric learning (SVML) model on SPD manifold. We define a positive definite kernel for point pairs on SPD manifold and transform metric learning on SPD manifold to a point pair classification problem. The metric learning problem can be efficiently solved by standard support vector machines. Compared with classifying points on SPD manifold by support vector machines directly, SVML effectively learns a distance metric for SPD matrices by training a binary support vector machine model. Experiments on video based face recognition, image set classification, and material classification show that SVML outperforms the state-of-the-art metric learning algorithms on SPD manifold.
Image-based salient object detection is a useful and important technique, which can promote the efficiency of several applications such as object detection, image classification/retrieval, object co-segmentation, and content-based image editing. In this letter, we present a novel weighted low-rank matrix recovery (WLRR) model for salient object detection. In order to facilitate efficient salient objects-background separation, a high-level background prior map is estimated by employing the property of the color, location, and boundary connectivity, and then this prior map is ensembled into a weighting matrix which indicates the likelihood that each image region belongs to the background. The final salient object detection task is formulated as the WLRR model with the weighting matrix. Both quantitative and qualitative experimental results on three challenging datasets show competitive results as compared with 24 state-of-the-art methods.
Recent advancements in large vision-language models (LVLMs) have demonstrated impressive capability in visual information understanding with human language. Despite these advances, LVLMs still face challenges with multimodal hallucination, such as generating text descriptions of objects that are not present in the visual information. However, the underlying fundamental reasons of multimodal hallucinations remain poorly explored. In this paper, we propose a new perspective, suggesting that the inherent biases in LVLMs might be a key factor in hallucinations. Specifically, we systematically identify a semantic shift bias related to paragraph breaks ('$\textbackslash n\textbackslash n$'), where the content before and after '$\textbackslash n\textbackslash n$' in the training data frequently exhibit significant semantic changes. This pattern leads the model to infer that the contents following '$\textbackslash n\textbackslash n$' should be obviously different from the preceding contents with less hallucinatory descriptions, thereby increasing the probability of hallucinatory descriptions subsequent to the '$\textbackslash n\textbackslash n$'. We have validated this hypothesis on multiple publicly available LVLMs. Besides, we find that deliberately inserting '$\textbackslash n\textbackslash n$' at the generated description can induce more hallucinations. A simple method is proposed to effectively mitigate the hallucination of LVLMs by skipping the output of `\textbackslash n'.
To improve the discrimination of attribute representation, in this paper, we propose to extend the traditional attribute representations via embedding the latent high-order structure between attributes. Specifically, our aim is to construct the Latent Extended Attribute Features (LEAF) for visual classification. Since there only exist weak label for each attribute, we firstly propose a feature selection method to explore the common feature structures across categories. After that, the attribute classifiers are trained based on the selected features. Then, the category specific graph is introduced, which is composed of single attributes and their co-occurrence attribute pairs. This attribute graph is used as the initialized representation of each image. Considering our aim, we should discover the discriminative latent structure between attributes and train the robust category classifiers. To that end, we develop a joint learning objective function which is composed of the high-order representation mining term and the classifier training term. The mining term can both preserve category-specific information and discover the common structure between categories. Based on the discovery representation, the robust visual classifiers could be trained by the classifier term. Finally, an alternating optimization method is designed to seek the optimal solution of our objective function. Experimental results on the challenging datasets demonstrate the advantages of our proposed model over existing work.
The early postnatal period witnesses rapid and dynamic brain development. However, the relationship between brain anatomical structure and cognitive ability is still unknown. Currently, there is no explicit model to characterize this relationship in the literature. In this paper, we explore this relationship by investigating the mapping between morphological features of the cerebral cortex and cognitive scores. To this end, we introduce a multi-view multi-task learning approach to intuitively explore complementary information from different time-points and handle the missing data issue in longitudinal studies simultaneously. Accordingly, we establish a novel model, latent partial multi-view representation learning. Our approach regards data from different time-points as different views and constructs a latent representation to capture the complementary information from incomplete time-points. The latent representation explores the complementarity across different time-points and improves the accuracy of prediction. The minimization problem is solved by the alternating direction method of multipliers. Experimental results on both synthetic and real data validate the effectiveness of our proposed algorithm.
Although multi-view learning has made signifificant progress over the past few decades, it is still challenging due to the diffificulty in modeling complex correlations among different views, especially under the context of view missing. To address the challenge, we propose a novel framework termed Cross Partial Multi-View Networks (CPM-Nets), which aims to fully and flflexibly take advantage of multiple partial views. We fifirst provide a formal defifinition of completeness and versatility for multi-view representation and then theoretically prove the versatility of the learned latent representations. For completeness, the task of learning latent multi-view representation is specififically translated to a degradation process by mimicking data transmission, such that the optimal tradeoff between consistency and complementarity across different views can be implicitly achieved. Equipped with adversarial strategy, our model stably imputes missing views, encoding information from all views for each sample to be encoded into latent representation to further enhance the completeness. Furthermore, a nonparametric classifification loss is introduced to produce structured representations and prevent overfifitting, which endows the algorithm with promising generalization under view-missing cases. Extensive experimental results validate the effectiveness of our algorithm over existing state of the arts for classifification, representation learning and data imputation.