Multivariate time series (MTS) are collected for different variables in studying scientific phenomena or monitoring system health where each time series records the values of one variable for a time period. Among the different variables, it is common that only a few variables contribute significantly to a specific phenomenon. Furthermore, the variables contributing significantly to different phenomena are often different. We denote the different variables that contribute to the occurrences of different phenomena as Phenomenon-specific Variables (PVs). In this paper, we formulate a novel problem of identifying significant PVs from MTS datasets. To analyze MTS data, feature extraction techniques have been extensively studied. However, most of them identify important global features for one dataset and do not utilize the temporal order of time series. To solve the newly introduced problem, we propose a solution framework, CNN mts -X, which is a new variant of the Convolutional Neural Networks (CNN) and can embed other feature extraction techniques (as X). Furthermore, we design a CNN mts -LR method that implements a new feature identification approach (LR) as Xin the CNN mts -X framework. The LR method leverages both Linear Discriminant Analysis (LDA) and Random Forest (RF). Our extensive experiments on five real datasets show that the CNN mts -LR method has exhibited much better performance than several other baseline methods. Using 30 percent of the PVs discovered from the CNN mts -LR, classifications can achieve better or similar performance than using all the variables.
Widespread placement and high data sampling rate of current generation of phasor measurement units (PMUs) in wide area monitoring systems result in huge amount of data to be analyzed and stored, making efficient storage of such data a priority. This paper presents a generalized compression technique that utilizes the inherent correlation within PMU data by exploiting both spatial and temporal redundancies. A two stage compression algorithm is proposed using principal component analysis in the first stage and discrete cosine transform in the second. Since compression parameters need to be adjusted to compress critical disturbance information with high fidelity, an automated but simple statistical change detection technique is proposed to identify disturbance data. Extensive verifications are performed using field data, as well as simulated data to establish generality and superior performance of the method.
Data recorded by Phasor Measurement Units (PMUs) contains noise. This paper characterizes and quantifies this noise for voltage, current and frequency data recorded at three different voltage levels. The probability distribution of the measurement noise and its typical power are identified. The PMU noise quantification can help in generation of experimental PMU data in close conformity with field PMU data, bad data removal, missing data prediction, and effective design of statistical filters for noise rejection.
Previous chapter Next chapter Full AccessProceedings Proceedings of the 2014 SIAM International Conference on Data Mining (SDM)Detecting Influence Relationships from GraphsChuan Hu, Huiping Cao, and Chaomin KeChuan Hu, Huiping Cao, and Chaomin Kepp.821 - 829Chapter DOI:https://doi.org/10.1137/1.9781611973440.94PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract Graphs have been widely used to represent objects and object connections in applications such as the Web, social networks, and citation networks. Mining influence relationships from graphs has gained interests in recent years because providing influence information about the object connections in graphs can facilitate graph exploration, graph search, and connection recommendations. In this paper, we study the problem of detecting influence aspects, on which objects are connected, and influence degree (or influence strength), with which one graph node influences another graph node on a given aspect. Existing techniques focus on inferring either the influence degrees or influence types from graphs. We propose two generative Aspect Influence Models, OAIM and LAIM, to detect both influence aspects and influence degrees. These models utilize the topological structure of the graphs, the text content associated with objects, and the context in which the objects are connected. We compare these two models with one baseline approach which considers only the text content associated with objects. The empirical studies on citation graphs and networks of users from Twitter show that our models can discover more effective results than the baseline approach. Previous chapter Next chapter RelatedDetails Published:2014eISBN:978-1-61197-344-0 https://doi.org/10.1137/1.9781611973440Book Series Name:ProceedingsBook Code:PRDT14Book Pages:1-1086Key words:graph, influence aspect, influence degree, probabilistic generative model, Gibbs sampling
A solution of a keyword query over graphs is a Group Steiner tree, which is rooted at a node and whose nodes collectively satisfy the query (e.g. node keywords cover all the query keywords), and in which the sum of edge weights satisfies given conditions (e.g., need to be minimum or be the first K minimal among all possible sub-graphs satisfying the query). Most existing techniques for evaluating keyword queries over graphs run on a centralized computer. We propose a new approach, SOverlapping, to evaluate keyword queries over graphs on MapReduce framework by utilizing probabilistic theory to partition graphs. The new approach has shown to be effective and efficient when tested on real graph data sets.
Animal welfare monitoring relies on sensor accuracy for detecting changes in animal well-being. We compared the distance calculations based on global positioning system (GPS) data alone or combined with motion data from triaxial accelerometers. The assessment involved static trackers placed outdoors or indoors vs. trackers mounted on cows grazing on pasture. Trackers communicated motion data at 1 min intervals and GPS positions at 15 min intervals for seven days. Daily distance walked was determined using the following: (1) raw GPS data (RawDist), (2) data with erroneous GPS locations removed (CorrectedDist), or (3) data with erroneous GPS locations removed, combined with the exclusion of GPS data associated with no motion reading (CorrectedDist_Act). Distances were analyzed via one-way ANOVA to compare the effects of tracker placement (Indoor, Outdoor, or Animal). No difference was detected between the tracker placement for RawDist. The computation of CorrectedDist differed between the tracker placements. However, due to the random error of GPS measurements, CorrectedDist for Indoor static trackers differed from zero. The walking distance calculated by CorrectedDist_Act differed between the tracker placements, with distances for static trackers not differing from zero. The fusion of GPS and accelerometer data better detected animal welfare implications related to immobility in grazing cattle.