logo
    Recurrent Neural Networks for Multivariate Time Series with Missing Values
    137
    Citation
    30
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provides useful insights for better understanding and utilization of missing values in time series analysis.
    Keywords:
    Imputation (statistics)
    Multiple imputation can be a good solution to handling missing data if data are missing at random. However, this assumption is often difficult to verify. We describe an application of multiple imputation that makes this assumption plausible. This procedure requires contacting a random sample of subjects with incomplete data to fill in the missing information, and then adjusting the imputation model to incorporate the new data. Simulations with missing data that were decidedly not missing at random showed, as expected, that the method restored the original beta coefficients, whereas other methods of dealing with missing data failed. Using a dataset with real missing data, we found that different approaches to imputation produced moderately different results. Simulations suggest that filling in 10% of data that was initially missing is sufficient for imputation in many epidemiologic applications, and should produce approximately unbiased results, provided there is a high response on follow-up from the subsample of those with some originally missing data. This response can probably be achieved if this data collection is planned as an initial approach to dealing with the missing data, rather than at later stages, after further attempts that leave only data that is very difficult to complete.
    Imputation (statistics)
    Missing data are a common problem in nutritional epidemiology. Little is known of the characteristics of these missing data, which makes it difficult to conduct appropriate imputation.We telephoned, at random, 20% of subjects (n = 2091) from the Adventist Health Study-2 cohort who had any of 80 key variables missing from a dietary questionnaire. We were able to obtain responses for 92% of the missing variables.We found a consistent excess of "zero" intakes in the filled-in data that were initially missing. However, for frequently consumed foods, most missing data were not zero, and these were usually not distinguishable from a random sample of nonzero data. Older, black, and less-well-educated subjects had more missing data. Missing data are more likely to be true zeroes in older subjects and those with more missing data. Zero imputation for missing data may create little bias except for more frequently consumed foods, in which case, zero imputation will be suboptimal if there is more than 5%-10% missing.Although some missing data represent true zeroes, much of it does not, and data are usually not missing at random. Automatic imputation of zeroes for missing data will usually be incorrect, although there is [corrected] little bias unless the foods are frequently consumed. Certain identifiable subgroups have greater amounts of missing data, and require greater care in making imputations.
    Imputation (statistics)
    Abstract Missing data is a major problem in real-world datasets, which hinders the performance of data analytics. Conventional data imputation schemes such as univariate single imputation replace missing values in each column with the same approximated value. These univariate single imputation techniques underestimate the variance of the imputed values. On the other hand, multivariate imputation explores the relationships between different columns of data, to impute the missing values. Reinforcement Learning (RL) is a machine learning paradigm where the agent learns by taking actions and receiving rewards in response, to achieve its goal. In this work, we propose an RL-based approach to impute missing data by learning a policy to impute data through an action-reward-based experience. Our approach imputes missing values in a column by working only on the same column (similar to univariate single imputation) but imputes the missing values in the column with different values thus keeping the variance in the imputed values. We report superior performance of our approach, compared with other imputation techniques, on a number of datasets.
    Imputation (statistics)
    Univariate
    Citations (15)
    In medical research missing data are sometimes inevitable. Different missingness mechanisms can be distinguished: (a) missing completely at random; (b) missing by design; (c) missing at random, and (d) missing not at random. If participants with missing data are excluded from statistical analyses, this can lead to biased study results and loss of statistical power. Imputation methods can be applied to estimate missing values; multiple imputation gives a good idea of the inaccuracy of the reconstructed measurements. The most common imputation methods assume that missing data are missing at random. Multiple imputation contributes greatly to the efficiency and reliability of estimates because maximum use is made of the data collected. Imputation is not meant to obviate low-quality data.
    Imputation (statistics)
    Citations (18)
    Databases for machine learning and data mining often have missing values.How to develop effective method for missing values imputation is an important problem in the field of machine learning and data mining.In this paper, several methods for dealing with missing values in incomplete data are reviewed, and a new method for missing values imputation based on iterative learning is proposed.The proposed method is based on a basic assumption: There exist cause-effect connections among condition attribute values, and the missing values can be induced from known values.In the process of missing values imputation, a part of missing values are filled in at first and converted to known values, which are used for the next step of missing values imputation.The iterative learning process will go on until an incomplete data is entirely converted to a complete data.The paper also presents an example to illustrate the framework of iterative learning for missing values imputation.
    Imputation (statistics)
    Iterative Learning Control
    Citations (4)
    When we analyze incomplete data, i.e., data with missing values, we need treatment for the missing values. A common way to deal with this problem is to delete the cases with missing values. Various other methods have been developed. Among them are EM algorithm and regression algorithm which can estimate missing values and impute the missing elements with the estimated values. In this paper, we introduce multiple imputation software SOLAS which generates multiple data sets and imputes with them.
    Imputation (statistics)
    Citations (0)
    Missing data on network ties is a fundamental problem for network analyses. The biases induced by missing edge data, even when missing completely at random (MCAR), are widely acknowledged (Kossinets, 2006; Huisman & Steglich, 2008; Huisman, 2009). Although model based techniques for missing network data are quite promising, they are not available for all analyses (Koskinen, Robins & Pattison, 2010). Multiple imputation for network data is able to overcome this problem. This study expands on recent work on multiple imputation of missing data in networks with extensive simulations (Wang et al. 2016). Different models for imputing the missing data are compared under 64 conditions
    Imputation (statistics)
    Citations (0)
    Missing values are a common occurrence in condition monitoring datasets. To effectively improve the integrity of data, many data imputation methods have been developed to replace the missing values with the estimated values. However, these methods do not always perform well in datasets containing different types of missing values. Three types of missing data are defined, namely isolated missing value, continuous missing variable, and continuous missing sample. A three‐step data imputation method is proposed to sequentially impute these missing values following the principle from easy to difficult. The original time series data is first to split into different segments according to the positions of continuous missing samples. Then, interpolation and space‐based methods are applied to sequentially estimate isolated missing values and continuous missing variables in each segment. Finally, a stepwise extrapolation prediction model based on the long short‐term memory network is established to repair continuous missing samples between each segment. Two application examples are implemented on different dissolved gas analysis datasets and load datasets. Compared with state‐of‐the‐art techniques, the proposed three‐step data imputation method is general and can be applied to many scenarios because it establishes a rational data recovery sequence to accurately repair both stationary and non‐stationary condition monitoring data.
    Imputation (statistics)
    Citations (15)