We have witnessed the rapid development of information technology in recent years.One of the key phenomena is the fast, near-exponential increase of data.Consequently, most of the traditional data classification methods fail to meet the dynamic and real-time demands of today's data processing and analyzing needs--especially for continuous data streams.This paper proposes an improved incremental learning algorithm for a large-scale data stream, which is based on SVM (Support Vector Machine) and is named DS-IILS.The DS-IILS takes the load condition of the entire system and the node performance into consideration to improve efficiency.The threshold of the distance to the optimal separating hyperplane is given in the DS-IILS algorithm.The samples of the history sample set and the incremental sample set that are within the scope of the threshold are all reserved.These reserved samples are treated as the training sample set.To design a more accurate classifier, the effects of the data volumes of the history sample set and the incremental sample set are handled by weighted processing.Finally, the algorithm is implemented in a cloud computing system and is applied to study user behaviors.The results of the experiment are provided and compared with other incremental learning algorithms.The results show that the DS-IILS can improve training efficiency and guarantee relatively high classification accuracy at the same time, which is consistent with the theoretical analysis.
This paper aims to study the role of sentiment dispersion in stock market. We extract the investor sentiment from tweets that are specifically about opinions on stocks. Naïve Bayes is then used to assign each tweet a conditional probability representing how positive each tweet is. We did not discretize the probability so as to reduce the information loss. Sentiment dispersion is then measured by standard deviation. The resulting sentiment dispersion is then correlate with future stock returns and realized volatility. This research is able to show whether sentiment dispersion contains information about future return and volatility, which are helpful in formulating investment strategy.
Complex objects are usually with multiple labels, and can be represented by multiple modal representations, e.g., the complex articles contain text and image information as well as multiple annotations. Previous methods assume that the homogeneous multi-modal data are consistent, while in real applications, the raw data are disordered, e.g., the article constitutes with variable number of inconsistent text and image instances. Therefore, Multi-modal Multi-instance Multi-label (M3) learning provides a framework for handling such task and has exhibited excellent performance. However, M3 learning is facing two main challenges: 1) how to effectively utilize label correlation and 2) how to take advantage of multi-modal learning to process unlabeled instances. To solve these problems, we first propose a novel Multi-modal Multi-instance Multi-label Deep Network (M3DN), which considers M3 learning in an end-to-end multi-modal deep network and utilizes consistency principle among different modal bag-level predictions. Based on the M3DN, we learn the latent ground label metric with the optimal transport. Moreover, we introduce the extrinsic unlabeled multi-modal multi-instance data, and propose the M3DNS, which considers the instance-level auto-encoder for single modality and modified bag-level optimal transport to strengthen the consistency among modalities. Thereby M3DNS can better predict label and exploit label correlation simultaneously. Experiments on benchmark datasets and real world WKG Game-Hub dataset validate the effectiveness of the proposed methods.
Fault detection plays a crucial role in wireless sensor networks (WSNs). Many fault detection approaches requiring a priori knowledge of network faults have been proposed to distinguish faulty sensors by exploring spatial-temporal correlations among sensor readings. However, many faulty sensors that may not generate anomalous sensor readings, and potential failures with unknown types and symptoms remain undetected. In this paper, we propose a Metric-Correlation-Based Fault Detection (MCFD) approach using clustering analysis. It is motivated by the fact that the system metric correlations of most fault-free sensors usually show strong similarities, whereas different patterns of such correlations indicate potential failures. MCFD explores internal metric correlations inside sensors using correlation value views. An improved Neighbor-based Local Density Clustering Analysis (NLDCA) algorithm based on the Neighbor-based Local Density Factor (NLDF) is applied in spatial domain detection to cluster similar correlation value views together, thus potential faulty sensors with abnormal views not belonging to any cluster can be detected. Simulation results demonstrate that MCFD approach performs well in respects of higher detection accuracy and lower false positive rate even under high node failure ratios and dense distribution conditions.
Complex intelligent systems such as for tackling the COVID-19 pandemic involve multiple multivariate time series (MTSs), where both target variables (such as COVID-19 infected, confirmed, and recovered cases) and external factors (such as virus mutation and infectivity, vaccination, and government intervention influence) are coupled. Forecasting such MTSs with multiple external MTS factors needs to model both within and between MTS interactions and handle their uncertainty, heterogeneity, and dynamics. Existing shallow to deep MTS modelers, including regressors, deep recurrent neural networks such as DeepAR, deep state space models, and deep factor models, do not jointly characterize these issues in a probabilistic manner across MTSs. We propose an end-to-end deep probabilistic cross-MTS learning network MTSNet. MTSNet incorporates a tensor input with scaled target and external MTSs. It then vertically and horizontally stacks long-short memory networks for encoding and decoding target MTSs and enhances uncertainty modeling, generalization and forecasting robustness by residual connection, variational zoneout, and probabilistic forecasting. The tensor input is projected to a probability distribution for target MTS forecasting. MTSNet outperforms the SOTA deep probabilistic MTS networks in forecasting COVID-19 confirmed cases and ICU patient numbers for six countries by involving virus mutation, vaccination, government interventions, and infectivity.
Machine learning (ML) technologies have achieved significant success in various downstream tasks, e.g., node classification, link prediction, community detection, graph classification and graph clustering. However, many studies have shown that the models built upon ML technologies are vulnerable to noises and adversarial attacks. A number of works have studied the robust models against noise or adversarial examples in image domains and text processing domains, however, it is more challenging to learn robust models in graph domains. Adding noises or perturbations on graph data will make the robustness even harder to enhance – the noises and perturbations of edges or node attributes are easy to propagate to other neighbors via the relational information on a graph. In this paper, we investigate and summarize the existing works that study the robust deep learning models against adversarial attacks or noises on graphs, namely the robust learning (models) on graphs. Specifically, we first provide some robustness evaluation metrics of model robustness on graphs. Then, we comprehensively provide a taxonomy which groups robust models on graphs into five categories: anomaly detection, adversarial training, pre-processing, attention mechanism, and certifiable robustness. Besides, we emphasize some promising future directions in learning robust models on graphs. Hopefully, our works can offer insights for the relevant researchers, thus providing assistance for their studies.