logo
    Abstract:
    In this paper, we introduce RaVAEn, a lightweight, unsupervised approach for change detection in satellite data based on Variational Auto-Encoders (VAEs) with the specific purpose of on-board deployment. Applications such as disaster management enormously benefit from the rapid availability of satellite observations. Traditionally, data analysis is performed on the ground after all data is transferred - downlinked - to a ground station. Constraint on the downlink capabilities therefore affects any downstream application. In contrast, RaVAEn pre-processes the sampled data directly on the satellite and flags changed areas to prioritise for downlink, shortening the response time. We verified the efficacy of our system on a dataset composed of time series of catastrophic events - which we plan to release alongside this publication - demonstrating that RaVAEn outperforms pixel-wise baselines. Finally we tested our approach on resource-limited hardware for assessing computational and memory limitations.
    Keywords:
    Ground segment
    Pervasive sensing is one of the most prominent technologies being adapted by current process industry. Every process industry is highly equipped with wireless sensors for process monitoring in which location, human intervention is to be limited. Thus, major challenge with these numerous sensors is to store and analyze large volume of sensor data stream. This paper focuses on sensor data analysis along with anomaly detection specific to process sector because the placement and nature of the data generated from these sensors follows a specific pattern during process flow. This data is more structured than other type of big data, in which data is more unstructured. No assurance that any single algorithm can produce optimized results. So this paper presents a generic frame work with ensemble of methods such as probability and statistics, Neural Networks and Clustering. Here Neural Net is supervised learning model to predict new data based on trained data. But unseen data is wrongly predictable by Neural nets. For that reason clustering is used as Unsupervised learning model to efficiently handle concept drifts in sensor data stream. These solutions are implemented to various data scenarios with practical means to improve prediction and anomaly detection accuracy of equipment as well as process flows. To the best of our knowledge no single framework is available to fully analyse sensor data stream related to independent, correlation based, group wise with respect to process flow segmentation and process and sub process hierarchy analysis.
    Citations (16)
    In today's digital era, data are everywhere from Internet of Things to health care or financial applications. This leads to potentially unbounded ever-growing Big data streams and it needs to be utilized effectively. Data normalization is an important preprocessing technique for data analytics. It helps prevent mismodeling and reduce the complexity inherent in the data especially for data integrated from multiple sources and contexts. Normalization of Big Data stream is challenging because of evolving inconsistencies, time and memory constraints, and non-availability of whole data beforehand. This paper proposes a distributed approach to adaptive normalization for Big data stream. Using sliding windows of fixed size, it provides a simple mechanism to adapt the statistics for normalizing changing data in each window. Implemented on Apache Storm, a distributed real-time stream data framework, our approach exploits distributed data processing for efficient normalization. Unlike other existing adaptive approaches that normalize data for a specific use (e.g., classification), ours does not. Moreover, our adaptive mechanism allows flexible controls, via user-specified thresholds, for normalization tradeoffs between time and precision. The paper illustrates our proposed approach along with a few other techniques and experiments on both synthesized and real-world data. The normalized data obtained from our proposed approach, on 160,000 instances of data stream, improves over the baseline by 89% with 0.0041 root-mean-square error compared with the actual data.
    Normalization
    Database normalization
    Data pre-processing
    Citations (10)
    Remote sensing streams continuous data feed from the satellite to ground station for data analysis. Often the data analytics involves analyzing data in real-time, such as emergency control, surveillance of military operations or scenarios that change rapidly. Traditional data mining requires all the data to be available prior to inducing a model by supervised learning, for automatic image recognition or classification. Any new update on the data prompts the model to be built again by loading in all the previous and new data. Therefore, the training time will increase indefinitely making it unsuitable for real-time application in remote sensing. As a contribution to solving this problem, a new approach of data analytics for remote sensing for data stream mining is formulated and reported in this paper. Fresh data feed collected from afar is used to approximate an image recognition model without reloading the history, which helps eliminate the latency in building the model again and again. In the past, data stream mining has a drawback in approximating a classification model with a sufficiently high level of accuracy. This is due to the one-pass incremental learning mechanism inherently exists in the design of the data stream mining algorithm. In order to solve this problem, a novel streamlined sensor data processing method is proposed called evolutionary expand-and-contract instance-based learning algorithm (EEAC-IBL). The multivariate data stream is first expanded into many subspaces, and then the subspaces, which are corresponding to the characteristics of the features are selected and condensed into a significant feature subset. The selection operates stochastically instead of deterministically by evolutionary optimization, which approximates the best subgroup. Followed by data stream mining, the model learning for image recognition is done on the fly. This stochastic approximation method is fast and accurate, offering an alternative to the traditional machine learning method for image recognition application in remote sensing. Our experimental results show computing advantages over other classical approaches, with a mean accuracy improvement at 16.62%.
    Citations (4)
    Change detection is the process of identifying differences in the state of an object or phenomenon by observing it at different times or different locations in space. In the streaming context, it is the process of segmenting a data stream into different segments by identifying the points where the stream dynamics changes. Decentralized change detection can be used in many interesting, and important applications such environmental observing systems, medicare monitoring systems. Although there is great deal of work on distributed detection and data fusion, most of work focuses on the one-time change detection solutions. One-time change detection method requires to proceed data once in response to the change occurring. The trade-off of a continuous distributed detection of changes include detection accuracy, spaceefficiency, detection delay, and communication-efficiency. To achieve these goals, the wildfire warning system is used as a motivating scenario. From the challenges and requirements of the wildfire warning system, the change detection algorithms for streaming data are proposed a part of the solution to the wildfire warning system. By selecting various models of local change detection, different schemes for distributed change detections, and the data exchange protocols, different designs can be achieved. Based on this approach, the contributions of this dissertation are as follows. A general two-window framework for detecting changes in a single data stream is presented. A general synopsis-based change detection framework is proposed. Theoretical and empirical analysis shows that the detection performance of synopsisbased detector is similar to that of non-synopsis change detector if a distance function quantifying the changes is preserved under the process of constructing synopsis. A clustering-based change detection and clustering maintenance method over sliding window is presented. Clustering-based detector can automatically detect the changes in the multivariate streaming data. A framework for decentralized change detection in wireless sensor networks is proposed. A distributed framework for clustering streaming data is proposed by extending the two-phased stream clustering approach which is widely used to cluster a single data stream.
    Streaming Data
    Sensor Fusion
    Concept Drift
    Citations (2)
    Space-ground aided cooperative spectrum monitoring, which combines the benefits of satellite components and terrestrial components for improving monitoring accuracy and enlarging monitoring area, has been becoming an emerging application of the space-ground integrated networks (SGIN). However, a short transmission window is usually provided for satellite components to connect with ground gateway, which means only a limited transmission time is allowed for the satellite component to upload the collected spectrum data. On the other hand, lots of redundancy may exist among the spectrum data collected by a single sensor during one collection period, which may further reduce the data uploading efficiency. In this paper, we investigate the similar data detection which is a matching problem for comparing two data, and it is important to the following data compression for improving data uploading efficiency. Firstly, the definition of the sharing fragment set is given. Then a metric method is presented to measure the redundancy of one data with respect to another data. We propose a Sharing Fragment Set (SFS) algorithm that can select a good sharing fragment set. Theoretical analysis proves that the proposed SFS algorithm is well suited to determine the redundancy between datas. In addition, we conduct an experiment based on the randomly produced synthetic dataset. Numerical results shows that the SFS algorithm performs better in selecting sharing fragment set compared with the Greedy-String-Tiling (GST) and simple greedy algorithm.
    Upload
    Data redundancy
    Obtaining an accurate estimate of a land-cover classifier's performance over a wide geographic area is a challenging problem due to the need to generate the ground truth that represents the entire area, which may be thousands of square kilometers in size. The current best approach for solving this problem constructs a test set by drawing samples randomly from the entire area-with a human supplying the true label for each such sample-with the hope that the labeled data thus collected capture statistically all of the data diversity in the area. A major shortcoming of this approach is that, in an interactive session, it is difficult for a human to ensure that the information provided by the next data sample chosen by the random sampler is nonredundant with respect to the data already collected. In order to reduce the annotation burden caused by this uncertainty, it makes sense to remove any redundancies from the entire dataset before presenting its samples to the human for annotation. This article presents a framework that uses a combination of clustering and compression to create a concise-set representation of the land-cover data for a large geographic area. Whereas clustering is achieved by applying locality-sensitive hashing to the data elements, compression is achieved by choosing a single data element to represent a cluster. This framework reduces the annotation burden on the human and makes it more likely that the human would persevere during the annotation stage. We validate our framework experimentally by comparing it with the traditional random sampling approach using WorldView2 satellite imagery.
    Land Cover
    Ground truth
    Large amounts of satellite data are now becoming available, which, in combination with appropriate change detection methods, offer the opportunity to derive accurate information on timing and location of disturbances such as deforestation events across the earth surface. Typical scenarios require the analysis of billions of image patches/pixels. While various change detection techniques have been proposed in the literature, the associated implementations usually do not scale well, which renders the corresponding analyses computationally very expensive or even impossible. In this work, we propose a novel massively-parallel implementation for a state-of-the-art change detection method and demonstrate its potential in the context of monitoring deforestation. The novel implementation can handle large scenarios in a few hours or days using cheap commodity hardware, compared to weeks or even years using the existing publicly available code, and enables researchers, for the first time, to conduct global-scale analyses covering large parts of our Earth using little computational resources. From a technical perspective, we provide a high-level parallel algorithm specification along with several performance-critical optimizations dedicated to efficiently map the specified parallelism to modern parallel devices. While a particular change detection method is addressed in this work, the algorithmic building blocks provided are also of immediate relevance to a wide variety of related approaches in remote sensing and other fields.
    Implementation
    Relevance
    Deforestation
    Automatic event detection from time series signals has wide applications, such as abnormal event detection in video surveillance and event detection in geophysical data. Traditional detection methods detect events primarily by the use of similarity and correlation in data. Those methods can be inefficient and yield low accuracy. In recent years, because of the significantly increased computational power, machine learning techniques have revolutionized many science and engineering domains. In this study, we apply a deep-learning-based method to the detection of events from time series seismic signals. However, a direct adaptation of the similar ideas from 2D object detection to our problem faces two challenges. The first challenge is that the duration of earthquake event varies significantly; The other is that the proposals generated are temporally correlated. To address these challenges, we propose a novel cascaded region-based convolutional neural network to capture earthquake events in different sizes, while incorporating contextual information to enrich features for each individual proposal. To achieve a better generalization performance, we use densely connected blocks as the backbone of our network. Because of the fact that some positive events are not correctly annotated, we further formulate the detection problem as a learning-from-noise problem. To verify the performance of our detection methods, we employ our methods to seismic data generated from a bi-axial "earthquake machine" located at Rock Mechanics Laboratory, and we acquire labels with the help of experts. Through our numerical tests, we show that our novel detection techniques yield high accuracy. Therefore, our novel deep-learning-based detection methods can potentially be powerful tools for locating events from time series data in various applications.
    Similarity (geometry)
    Citations (131)