Modern variable selection for longitudinal semi-parametric models with missing data
2
Citation
28
Reference
10
Related Paper
Citation Trend
Abstract:
Penalized methods for variable selection such as the Smoothly Clipped Absolute Deviation penalty have been increasingly applied to aid variable section in regression analysis. Much of the literature has focused on parametric models, while a few recent studies have shifted the focus and developed their applications for the popular semi-parametric, or distribution-free, generalized estimating equations (GEEs) and weighted GEE (WGEE). However, although the WGEE is composed of one main and one missing-data module, available methods only focus on the main module, with no variable selection for the missing-data module. In this paper, we develop a new approach to further extend the existing methods to enable variable selection for both modules. The approach is illustrated by both real and simulated study data.This chapter describes problems that arise through missing QoL assessment data. Situations are outlined where values are missing from otherwise completed questionnaires or where entire forms are missing. The main difficulty with either type of missing data is the bias they may introduce at the analysis stage. QoL data that are "missing" through attrition because the patient has died, from that which could be anticipated but was not returned, are distinguished. How missing values may be estimated, often termed as imputed, to ease the statistical analysis are described, but stress that imputing values is no substitute for collecting "real" data.
Attrition
Imputation (statistics)
Cite
Citations (3)
Imputation (statistics)
Cite
Citations (0)
Imputation (statistics)
Survey data collection
Cite
Citations (0)
Imputation (statistics)
Cite
Citations (126)
Missing data are a well known issue in statistical inference, because some responses may be missing, even when data are collected carefully. The problem that arises in these cases is how to deal with missing data. In this article, the missingness is analyzed in knowledge space theory, and in particular when the basic local independence model (BLIM) is applied to the data. Two extensions of the BLIM to missing data are proposed: The former, called ignorable missing BLIM (IMBLIM), assumes that missing data are missing completely at random; the latter, called missing BLIM (MissBLIM), introduces specific dependencies of the missing data on the knowledge states, thus assuming that the missing data are missing not at random. The IMBLIM and the MissBLIM modeled the missingness in a satisfactory way, in both a simulation study and an empirical application, depending on the process that generates the missingness: If the missing data-generating process is of type missing completely at random, then either IMBLIM or MissBLIM provide adequate fit to the data. However, if the pattern of missingness is functionally dependent upon unobservable features of the data (e.g., missing answers are more likely to be wrong), then only a correctly specified model of the missingness distribution provides an adequate fit to the data.
Unobservable
Imputation (statistics)
Statistical Inference
Missing energy
Cite
Citations (23)
Longitudinal studies are those in which the same variable is repeatedly measured at different times. These studies are more likely than others to suffer from missing values. Since the presence of missing values may have an important impact on statistical analyses, it is important that they should be dealt with properly. In this paper, we present “Copy Mean”, a new method to impute intermittent missing values. We compared its efficiency in eleven imputation methods dedicated to the treatment of missing values in longitudinal data. All these methods were tested on three markedly different real datasets (stationary, increasing, and sinusoidal pattern) with complete data. For each of them, we generated nine types of incomplete datasets that include 10%, 30%, or 50% of missing data using either a Missing Completely at Random, a Missing at Random, or a Missing Not at Random missingness mechanism. Our results show that Copy Mean has a great effectiveness, exceeding or equaling the performance of other methods in almost all configurations. The effectiveness of linear interpolation is highly data-dependent. The Last Occurrence Carried Forward method is strongly discouraged.
Imputation (statistics)
Longitudinal data
Interpolation
Cite
Citations (47)
Data set
Cite
Citations (5)
Even in a well-designed and controlled study, missing data occurs in almost all research. Missing data can reduce the statistical power of a study and can produce biased estimates, leading to invalid conclusions. This manuscript reviews the problems and types of missing data, along with the techniques for handling missing data. The mechanisms by which missing data occurs are illustrated, and the methods for handling the missing data are discussed. The paper concludes with recommendations for the handling of missing data.
Cite
Citations (1,336)
Missing data are a well known issue in statistical inference, because some responses may be missing, even when data are collected carefully. The problem that arises in these cases is how to deal with missing data. In this article, the missingness is analyzed in knowledge space theory, and in particular when the basic local independence model (BLIM) is applied to the data. Two extensions of the BLIM to missing data are proposed: The former, called ignorable missing BLIM (IMBLIM), assumes that missing data are missing completely at random; the latter, called missing BLIM (MissBLIM), introduces specific dependencies of the missing data on the knowledge states, thus assuming that the missing data are missing not at random. The IMBLIM and the MissBLIM modeled the missingness in a satisfactory way, in both a simulation study and an empirical application, depending on the process that generates the missingness: If the missing data-generating process is of type missing completely at random, then either IMBLIM or MissBLIM provide adequate fit to the data. However, if the pattern of missingness is functionally dependent upon unobservable features of the data (e.g., missing answers are more likely to be wrong), then only a correctly specified model of the missingness distribution provides an adequate fit to the data.
Unobservable
Missing energy
Imputation (statistics)
Statistical Inference
Cite
Citations (0)
Imputation (statistics)
Cite
Citations (11)