Abstract In this paper, the direction of arrival (DOA) estimation of signals in the presence of impulsive noise environment is studied. Complex isotropic symmetric alpha-stable ( SαS ) random variables are modeled as impulsive noise, then a novel second-order statistic method that correntropy-based covariance matrix (CBCM) is defined, based on the combination of the CBCM of the array sensor outputs with the signal subspace technique (e.g., multiple signal classification (MUSIC)), which can be achieved source localization under impulsive noise environments. The Monte-Carlo simulation results illustrate the improved performance of CBCM-MUSIC for DOA estimation under a wide range of impulsive noise conditions.
Detecting nodes with erroneous information in graphs is important yet challenging, due to the lack of examples and the diversified s cenarios o f e rrors. W e i ntroduce GEDet, a few-shot learning based framework to detect erroneous nodes in graphs. GEDet consists of two novel components, each addresses a unique challenge. (1) To cope with the lack of examples, we introduce a graph augmentation module to enrich training labels. The module not only generates additional synthetic training labels by simulating different erroneous scenarios, but also exploits non-local relations to enrich neighborhood information. (2) To further improve the accuracy, we introduce an adversarially learned module that can better detect erroneous nodes by distinguishing nodes with synthetic and real labels encoded by graph autoencoders. Unlike conventional error detection models, GEDet yields effective classifiers that are optimized for a few yet diversified examples in the presence of multiple error scenarios. We show that using only a small number of examples, GEDet significantly improves the competing methods such as constraint-based detection and anomaly detection, with a gain of 35% on recall, and 30% on precision.
This paper studies the problem of subgraph query generation with guarantees on both diversity and group fairness. Given a query template (with parameterized search predicates) and a set of node groups in a graph, it is to compute a set of sub-graph queries that instantiate the query template, and each query ensures diversified answers that meanwhile covers each group with a desired number of nodes. Such need is evident in web and social search with fairness constraints, query optimization, and query benchmarking. We formalize a bi-criteria optimization problem that aims to find a Pareto optimal set of query instances in terms of diversity and fairness measures. We show the problem is in Δ $P$ 2 and verify its hardness (NP-hard and fixed-parameter tractable). We provide (1) two efficient algorithms that can approximate Pareto optimal sets with E-dominance relations that yield representative query instances with a bounded size, and (2) an online algorithm that progressively generates and maintains fixed-size ∊-Pareto set with small delay time. We experimentally verify that our algorithms can efficiently generate queries with desired diversity and coverage properties for targeted groups.
In order to meet the need of semantics recommendation to personalized commodity in e-commerce, the model based on fuzzy semantics personalized recommendation system is built to describe the integrated user's interest feature and commodity information.The user's interest degree of commodity and the correlation degree of commodity and interest are described using FALC syntax, and the related algorithm is used to complete user's interest data-mining.The experiment shows that the personalized recommendation method and model based on fuzzy semantics can make the e-commerce system to describe fuzzy commodity concept and recommend more appropriate goods to users.
Bio-inspired metaheuristic algorithms have been widely proposed to estimate parameters of photovoltaic (PV) models in recent years due to its ability to handle nonlinear functions regardless of the derivatives information. However, these algorithms normally utilize multiple agents/particles in the search process, and it takes much time to search the possible solutions in the whole search domain by sequential computing devices. This paper proposes parallel particle swarm optimization (PPSO) method to extract and estimate the parameters of a PV model. The algorithm is implemented in OpenCL and is executed on Nvidia multi-core GPUs. From the simulation results, it is observed that the proposed method is capable of accelerating the computational speed with the same accuracy in comparison to sequential particle swarm optimization (PSO).
Recent AI agents, such as ChatGPT and LLaMA, primarily rely on instruction tuning and reinforcement learning to calibrate the output of large language models (LLMs) with human intentions, ensuring the outputs are harmless and helpful. Existing methods heavily depend on the manual annotation of high-quality positive samples, while contending with issues such as noisy labels and minimal distinctions between preferred and dispreferred response data. However, readily available toxic samples with clear safety distinctions are often filtered out, removing valuable negative references that could aid LLMs in safety alignment. In response, we propose PT-ALIGN, a novel safety self-alignment approach that minimizes human supervision by automatically refining positive and toxic samples and performing fine-grained dual instruction tuning. Positive samples are harmless responses, while toxic samples deliberately contain extremely harmful content, serving as a new supervisory signals. Specifically, we utilize LLM itself to iteratively generate and refine training instances by only exploring fewer than 50 human annotations. We then employ two losses, i.e., maximum likelihood estimation (MLE) and fine-grained unlikelihood training (UT), to jointly learn to enhance the LLM's safety. The MLE loss encourages an LLM to maximize the generation of harmless content based on positive samples. Conversely, the fine-grained UT loss guides the LLM to minimize the output of harmful words based on negative samples at the token-level, thereby guiding the model to decouple safety from effectiveness, directing it toward safer fine-tuning objectives, and increasing the likelihood of generating helpful and reliable content. Experiments on 9 popular open-source LLMs demonstrate the effectiveness of our PT-ALIGN for safety alignment, while maintaining comparable levels of helpfulness and usefulness.
In our previous work, we have demonstrated an integrated proteome analysis device (iPAD-100) to analyze proteomes from 100 cells. (1) In this work, for the first time, a novel integrated device for single-cell analysis (iPAD-1) was developed to profile proteins in a single cell within 1 h. In the iPAD-1, a selected single cell was directly sucked into a 22 μm i.d. capillary. Then the cell lysis and protein digestion were simultaneously accomplished in the capillary in a 2 nL volume, which could prevent protein loss and excessive dilution. Digestion was accelerated by using elevated temperature with ultrasonication. The whole time of cell treatment was 30 min. After that, single-cell digest peptides were transferred into an LC column directly through a true zero dead volume union, to minimize protein transfer loss. A homemade 22 μm i.d. nano-LC packing column with 3 μm i.d. ESI tip was used in the device to achieve ultrasensitive detection. A 30 min elution program was applied to analysis of the single-cell proteome. Therefore, the total time needed for a single-cell analysis was only 1 h. In an analysis of 10 single HeLa cells, a maximum of 328 proteins were identified in one cell by using an Orbitrap Fusion Tribrid MS instrument, and the detection limit was estimated at around 1.7-170 zmol. Such a sensitivity of the iPAD-1 was ∼120-fold higher than that of our previously developed iPAD-100 system. (1) Prominent cellular heterogeneity in protein expressive profiling was observed. Furthermore, we roughly estimated the phases of the cell cycle of tested HeLa cells by the amount of core histone proteins.
The heterogeneous populations of exosomes with distinct nanosize have impeded our understanding of their corresponding function as intercellular communication agents. Profiling signaling proteins packaged in each size-dependent subtype can disclose this heterogeneity of exosomes. Herein, new strategy was developed for deconstructing heterogeneity of distinct-size urine exosome subpopulations by profiling N-glycoproteomics and phosphoproteomics simultaneously. Two-dimension size exclusion liquid chromatography (SEC) was utilized to isolate large exosomes (L-Exo), medium exosomes (M-Exo), and small exosomes (S-Exo) from human urine samples. Then, hydrophilic carbonyl-functionalized magnetic zirconium-organic framework (CFMZOF) was developed as probe for capturing the two kinds of post-translational modification (PTM) peptides simultaneously. Finally, liquid chromatography-tandem mass spectrometry (LC-MS/MS) combined with database search was used to characterize PTM protein contents. We identified 144 glycoproteins and 44 phosphoproteins from L-Exo, 156 glycoproteins, and 46 phosphoproteins from M-Exo and 134 glycoproteins and 10 phosphoproteins from S-Exo. The ratio of the proteins with simultaneous glycosylation and phosphorylation is 11%, 9%, and 3% in L-Exo, M-Exo, and S-Exo, respectively. Based on label-free quantification intensity results, both principal component analysis and Pearson's correlation coefficients indicate that distinct-size exosome subpopulations exist significant differences in PTM protein contents. Analysis of high abundance PTM proteins in each exosome subset reveals that the preferentially packaged PTM proteins in L-Exo, M-Exo, and S-Exo are associated with immune response, biological metabolism, and molecule transport processes, respectively. Our PTM proteomics study based on size-dependent exosome subtypes opens a new avenue for deconstructing the heterogeneity of exosomes.