Image-based salient object detection is a useful and important technique, which can promote the efficiency of several applications such as object detection, image classification/retrieval, object co-segmentation, and content-based image editing. In this letter, we present a novel weighted low-rank matrix recovery (WLRR) model for salient object detection. In order to facilitate efficient salient objects-background separation, a high-level background prior map is estimated by employing the property of the color, location, and boundary connectivity, and then this prior map is ensembled into a weighting matrix which indicates the likelihood that each image region belongs to the background. The final salient object detection task is formulated as the WLRR model with the weighting matrix. Both quantitative and qualitative experimental results on three challenging datasets show competitive results as compared with 24 state-of-the-art methods.
Identifying soil constitutive model parameters is essential for geotechnical engineering analysis. Soil constitutive model parameters are usually calibrated from laboratory or field test data, which can be formulated as an inverse analysis problem. Nevertheless, the test data or test costs in engineering are limited, so that available data may be insufficient to identify the unknown model parameters. The identifiability, that is, whether the data provide enough information for parametric identification, is rarely discussed for inverse analysis in geotechnical engineering. This paper aims to explore the identifiability of the modified Cam clay (MCC) model parameters based on the triaxial test data by the Bayesian method. In this study, the sequential attribute of triaxial data is considered by modelling the data using the probabilistic state space model. Results show that MCC model parameters might be unidentifiable using consolidated undrained triaxial data, while it is globally identifiable for consolidated drained test data. In addition, the maximum a posterior (MAP) estimate and mean values obtained from the joint posterior distribution are close to the true values when MCC model parameters are globally identifiable. However, this is not the case for unidentifiable cases.
Multimodal learning aims to integrate complementary information from different modalities for more reliable decisions. However, existing multimodal classification methods simply integrate the learned local features, which ignore the underlying structure of each modality and the higher-order correlation across modalities. In this paper, we propose a novel Hierarchical Attention Learning Network (HALNet) for multimodal classification. Specifically, HALNet has three merits: 1) A hierarchical feature fusion module is proposed to learn multilevel features, aggregating multi-level features for a global feature representation with the attention mechanism and progressive fusion tactics. 2) A cross-modal higher-order fusion module is introduced to capture the prospective cross-modal correlations at label space. 3) A dual prediction pattern is designed to generate credible decisions. Extensive experiments on three real-world multimodal datasets demonstrate that HALNet achieves competitive performance compared to the state-of-the-art.
Remote sensing change detection (RSCD), which identifies the changed and unchanged pixels from a registered pair of remote sensing images, has enjoyed remarkable success recently. However, locating changed objects with fine structural details is still a challenging problem in RSCD. In this paper, we propose a novel remote sensing change detection network via temporal feature interaction and guided refinement (TFI-GR) to solve this issue. Specifically, unlike previous methods, which just employ one single concatenation or subtraction operation for bi-temporal feature fusion, we design a temporal feature interaction module (TFIM) to enhance interaction between bi-temporal features and capture temporal difference information at diverse feature levels. Afterword, a guided refinement modules (GRM), which aggregates both low- and high-level temporal difference representations to polish the location information of high-level features and filter the background clutters of low-level features, is repeatedly performed. Finally, the multi-level temporal difference features are progressively fused to generate change maps for change detection. To demonstrate the effectiveness of the proposed TFI-GR, comprehensive experiments are performed on three high spatial resolution remote sensing change detection datasets. Experimental results indicate that the proposed method is superior to other state-of-the-art change detection methods. The demo code of this work is publicly available at https://github.com/guanyuezhen/TFI-GR.
This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as dynamic depth images (DDI), dynamic depth normal images (DDNI), and dynamic depth motion normal images (DDMNI), for both isolated and continuous action recognition. These dynamic images are constructed from a segmented sequence of depth maps using hierarchical bidirectional rank pooling to effectively capture the spatial-temporal information. Specifically, DDI exploits the dynamics of postures over time, and DDNI and DDMNI exploit the 3-D structural information captured by depth maps. Upon the proposed representations, a convolutional neural network (ConvNet)-based method is developed for action recognition. The image-based representations enable us to fine-tune the existing ConvNet models trained on image data without training a large number of parameters from scratch. The proposed method achieved the state-of-art results on three large datasets, namely, the large-scale continuous gesture recognition dataset (means the Jaccard index 0.4109), the large-scale isolated gesture recognition dataset (59.21%), and the NTU RGB+D dataset (87.08% cross-subject and 84.22% cross-view) even though only the depth modality was used.
Multi-view clustering (MvC) aims to integrate information from different views to enhance the capability of the model in capturing the underlying data structures. The widely used joint training paradigm in MvC is potentially not fully leverage the multi-view information, since the imbalanced and under-optimized view-specific features caused by the uniform learning objective for all views. For instance, particular views with more discriminative information could dominate the learning process in the joint training paradigm, leading to other views being under-optimized. To alleviate this issue, we first analyze the imbalanced phenomenon in the joint-training paradigm of multi-view clustering from the perspective of gradient descent for each view-specific feature extractor. Then, we propose a novel balanced multi-view clustering (BMvC) method, which introduces a view-specific contrastive regularization (VCR) to modulate the optimization of each view. Concretely, VCR preserves the sample similarities captured from the joint features and view-specific ones into the clustering distributions corresponding to view-specific features to enhance the learning process of view-specific feature extractors. Additionally, a theoretical analysis is provided to illustrate that VCR adaptively modulates the magnitudes of gradients for updating the parameters of view-specific feature extractors to achieve a balanced multi-view learning procedure. In such a manner, BMvC achieves a better trade-off between the exploitation of view-specific patterns and the exploration of view-invariance patterns to fully learn the multi-view information for the clustering task. Finally, a set of experiments are conducted to verify the superiority of the proposed method compared with state-of-the-art approaches both on eight benchmark MvC datasets and two spatially resolved transcriptomics datasets.
Deep learning-based hyperspectral image (HSI) classification and object detection techniques have gained significant attention due to their vital role in image content analysis, interpretation, and broader HSI applications. However, current hyperspectral object detection approaches predominantly emphasize spectral or spatial information, overlooking the valuable complementary relationship between these two aspects. In this study, we present a novel Spectral-Spatial Aggregation (S2ADet) object detector that effectively harnesses the rich spectral and spatial complementary information inherent in the hyperspectral image. S2ADet comprises a hyperspectral information decoupling (HID) module, a two-stream feature extraction network, and a one-stage detection head. The HID module processes hyperspectral data by aggregating spectral and spatial information via band selection and principal components analysis, consequently reducing redundancy. Based on the acquired spectral and spatial aggregation information, we propose a feature aggregation two-stream network for interacting spectral-spatial features. Furthermore, to address the limitations of existing databases, we annotate an extensive dataset, designated as HOD3K, containing 3,242 hyperspectral images captured across diverse real-world scenes and encompassing three object classes. These images possess a resolution of 512×256 pixels and cover 16 bands ranging from 470 nm to 620 nm. Comprehensive experiments on two datasets demonstrate that S2ADet surpasses existing state-of-the-art methods, achieving robust and reliable results. The demo code and dataset of this work are publicly available at https://github.com/hexiao-cs/S2ADet.
Abstract The rapid advancement of the Internet has brought a exponential growth in network traffic. At present, devices deployed at edge nodes process huge amount of data, extract key features of network traffic and then forward them to the cloud server/data center. However, since the efficiency of mobile terminal devices in identifying and classifying encrypted and malicious traffic lags behind, how to identify network traffic more efficiently and accurately remains a challenging problem. We design a convolutional neural network model: One-dimensional convolutional neural network with hexadecimal data (HexCNN-1D) that combines normalized processing and attention mechanisms. By adding the attention mechanism modules Global Attention Block (GAB) and Category Attention Block (CAB), network traffic is classified and identified. By extracting effective load information from hexadecimal network traffic, our model can identify most categories of network traffic including encrypted and malicious traffic data. The experimental results show that the average accuracy is 98.8%. Our model can greatly improve the accuracy of network traffic data recognition.
In this paper, we propose a hyperspectral band selection method via spatial-spectral weighted region-wise multiple graph fusion-based spectral clustering, referred to as RMGF briefly. Considering that different objects have different reflection characteristics, we use a superpixel segmentation algorithm to segment the first principal component of original hyperspectral image cube into homogeneous regions. For each superpixel, we construct a corresponding similarity graph to reflect the similarity between band pairs. Then, a multiple graph diffusion strategy with theoretical convergence guarantee is designed to learn a unified graph for partitioning the whole hyperspectral cube into several subcubes via spectral clustering. During the graph diffusion process, the spatial and spectral information of each superpixel are embedded to make spatial/spectral similar superpixels contribute more to each other. Finally, the band containing minimum noise in each subcube is selected to represent the whole subcube. Extensive experiments are conducted on three public datasets to validate the superiority of the proposed method when compared with other state-of-the-art ones.