Clustered longitudinal data analysis.

Citation

Reference

Related Paper

Abstract:

Clustered longitudinal data is often collected as repeated measurements on subjects over time arising in the clusters. Examples include longitudinal community intervention studies, or family studies with repeated measures on each member. Meanwhile, cluster size is sometime informative, which means that the risk for the outcomes is related to the cluster size. Under this situation, generalized estimating equations (GEE) will lead to invalid inferences because GEE assumes that the cluster size is non-informative. In this study, we investigated the performances of generalized estimating equations (GEE), cluster-weighted generalized estimating equations (CWGEE), and within-cluster resampling (WCR) on clustered longitudinal data. Based on our extensive simulation studies, we conclude that all three methods provide comparable estimates when the cluster size is non-informative. But when cluster size is informative, GEE gives biased estimates, while WCR and CWGEE still provide unbiased and consistent estimates under different \\\"working correlation structures\\\" within-subject. However, WCR is a computationally intensive approach, so CWGEE is the best choice for clustered longitudinal data due to its solving only one estimating equation, which is asymptotically equivalent to WCR.

Keywords:

Gee

Estimating equations

Longitudinal data

Resampling

Longitudinal Study

Topics:

Statistical Methods and Bayesian Inference

Statistical Methods and Inference

demographic modeling and climate adaptation

10.18297/etd/1524

Cite

PDF

Statistical analysis of daily smoking status in smoking cessation clinical trials

Addiction (2011)

Yimei Li E. Paul Wileyto Daniel F. Heitjan

Smoking cessation trials generally record information on daily smoking behavior, but base analyses on measures of smoking status at the end of treatment (EOT). We present an alternative approach that analyzes the entire sequence of daily smoking status observations.We analyzed daily abstinence data from a smoking cessation trial, using two longitudinal logistic regression methods: a mixed-effects (ME) model and a generalized estimating equations (GEE) model. We compared results to a standard analysis that takes abstinence status at EOT as outcome. We evaluated time-varying covariates (smoking history and time-varying drug effect) in the longitudinal analysis and compared ME and GEE approaches.We observed some differences in the estimated treatment effect odds ratios across models, with narrower confidence intervals under the longitudinal models. GEE yields similar results to ME when only baseline factors appear in the model, but gives biased results when one includes time-varying covariates. The longitudinal models indicate that the quit probability declines and the drug effect varies over time. Both the previous day's smoking status and recent smoking history predict quit probability, independently of the drug effect.When analysing outcomes of studies from smoking cessation interventions, longitudinal models with multiple outcome data points, rather than just end of treatment, can makes efficient use of the data and incorporate time-varying covariates. The generalized estimating equations approach should be avoided when using time-varying predictors.

Gee

Estimating equations

Odds

Longitudinal Study

Repeated measures design

Mixed model

10.1111/j.1360-0443.2011.03519.x

Cite

Citations (4)

Weighting Condom Use Data to Account for Nonignorable Cluster Size

Annals of Epidemiology (2007)

John Williamson Hae‐Young Kim Lee Warner

Gee

Estimating equations

Odds

10.1016/j.annepidem.2007.03.008

Cite

Citations (15)

Improved generalized estimating equation analysis via xtqls for implementation of quasi-least squares in Stata

Justine Shults Sarah J. Ratcliffe Mary B. Leonard

Quasi-least squares (QLS) is a method based on the popular generalized estimating equation (GEE) approach that is widely used for analysis of correlated cross-sectional and longitudinal data. This article summarizes the development of QLS that occurred in several manuscripts and describes its implementation with the user-written program xtqls in Stata. In addition, it demonstrates the following advantages of QLS: (i) QLS allows for implementation of some correlation structures that have not yet been implemented in the framework of GEE; (ii) QLS can be applied as an alternative to GEE if the GEE estimate is infeasible; and (iii) QLS is a method in the framework of GEE that uses the same estimating equation for estimation of β as GEE; as a result, implementation of QLS can involve programs already available for GEE. In particular, xtqls calls up the Stata program xtgee within an iterative approach that alternatives between updating estimates of the correlation parameter α and then using xtgee to solve the GEE estimating equation for β at the current estimate of α. The benefit of this approach is that following implementation of xtqls, all the usual post-regression estimation commands are readily available to the user. The xtqls program is available on the website for the Longitudinal Analysis for Diverse Populations project: http://www.cceb.upenn.edu/~sratclif/QLSproject.html.

Gee

Estimating equations

Least-squares function approximation

Source

Cite

Citations (19)

GEE estimation of a misspecified time‐varying covariate: an example with the effect of alcoholism treatment on medical utilization

Statistics in Medicine (2004)

Melanie M. Wall Yu Dai Lynn E. Eberly

Abstract The generalized estimation equation (GEE) method is widely used in longitudinal data analysis, particularly when the outcome variable is non‐Gaussian distributed. Under mild regulatory conditions, the parameter estimates are consistent and their asymptotic variances are efficient. In an observational study focusing on alcoholism patients, we applied the GEE method to longitudinal count data from medical utilization records from a large national managed care organization. The health services research question was whether there was a change in medical utilization for patients after engaging in alcoholism treatment as compared to before treatment. Thus, the main effect of interest was a time‐varying covariate indicating whether the patient had undergone treatment yet or not. GEE under five different working correlations was employed and mixed results regarding the significance of the treatment effect were found. Because of the large sample size, i.e. 8485 patients with an average of 46 repeated measurements per patient, differences across the estimates produced by the different working correlation structures was suspicious. It is shown that these differences are maybe caused by the fact that the time‐varying covariate in the marginal mean model is misspecified. A simulation study is performed to demonstrate that misspecification of the time‐varying covariate in the marginal mean structure can cause differences in GEE results across various choices of working correlation structure. Copyright © 2004 John Wiley & Sons, Ltd.

Gee

Estimating equations

Marginal model

10.1002/sim.1966

Cite

Citations (5)

Clustered longitudinal data analysis.

Wang Ming

Gee

Estimating equations

Longitudinal data

Resampling

Longitudinal Study

10.18297/etd/1524

Cite

Citations (1)

Modeling association in longitudinal binary outcomes: A brief review

Aging & Mental Health (2005)

Maggie Kuchibhatla Gerda G. Fillenbaum

In order to better understand aging, longitudinal studies are run in which participants are evaluated repeatedly and selected end-points (e.g., score on a cognitive screen, falls, occurrence/reoccurrence of a condition) are examined. The objective of the present paper is primarily to describe the methods available that take into account correlation between binary outcomes, and in particular to model the association of binary outcomes after controlling for covariates by using an implementation of generalized estimating equations (GEE) called 'alternating logistic regression' (ALR). In GEE, association within longitudinal outcomes is accounted for but not estimated. Alternating logistic regression, however, basically enables simultaneous estimation of pair-wise odds ratios of outcomes within a cluster, while accounting for the dependence of the outcome on covariates. A sub-sample (n = 2458) from a community-based sample of Duke Established Populations for Epidemiologic Studies of the Elderly is used. In the example used here, logistic regression using GEE and ALR is used to model binary outcomes at three time points (baseline, three and six years later) and to control for covariates in a representative community-based sample 65 years of age and older (n = 2458). The outcomes indicate any problem versus no problem on a five-item activities of daily living (ADL) scale in a community sample. The ALR model, however, provides insight into decline in ADL from baseline to each of the time-points whereas GEE does not. In both controlled and uncontrolled analyses, decline in ADL over three and six-year intervals (baseline to three years later, baseline to six years and three years post-baseline to six years post-baseline) is significant.

Gee

Odds

Longitudinal Study

Estimating equations

Association (psychology)

Sample (material)

10.1080/13607860500090102

Cite

Citations (4)

Performance of weighted estimating equations for longitudinal binary data with drop‐outs missing at random

Statistics in Medicine (2002)

John S. Preisser Kurt K. Lohman Paul J. Rathouz

Abstract The generalized estimating equations (GEE) approach is commonly used to model incomplete longitudinal binary data. When drop‐outs are missing at random through dependence on observed responses (MAR), GEE may give biased parameter estimates in the model for the marginal means. A weighted estimating equations approach gives consistent estimation under MAR when the drop‐out mechanism is correctly specified. In this approach, observations or person‐visits are weighted inversely proportional to their probability of being observed. Using a simulation study, we compare the performance of unweighted and weighted GEE in models for time‐specific means of a repeated binary response with MAR drop‐outs. Weighted GEE resulted in smaller finite sample bias than GEE. However, when the drop‐out model was misspecified, weighted GEE sometimes performed worse than GEE. Weighted GEE with observation‐level weights gave more efficient estimates than a weighted GEE procedure with cluster‐level weights. Copyright © 2002 John Wiley & Sons, Ltd.

Gee

Marginal model

Estimating equations

10.1002/sim.1241

Cite

Citations (155)

Generalized Estimating Equations in Longitudinal Studies: A Non-Parametric Alternative for Two-Way Repeated Measures Mixed ANOVA

Research Journal of Pharmacy and Technology (2023)

Kalesh M Karun M. S. Deepthy

Two group pre-post designs are very commonly used in medical research to study the effect of interventions on numerical outcome variables. Sometimes these measurements don’t follow the fundamental statistical assumption of normality and two-way repeated measures mixed ANOVA cannot be used. Generalized Estimating Equation (GEE) with Gamma log link function is a non-parametric analogue that can be used when data is skewed. When compared to other methods GEE has fewer assumptions and provides precise estimates.In the present study, the application of GEE is demonstrated using a simulated data. Different steps involved in the GEE analysis using SPSS software were also provided as an easy guide to researchers. This study could serve medical researchers understand, perform and interpret GEE in a better way.

Gee

Repeated measures design

Estimating equations

Mixed model

Marginal model

10.52711/0974-360x.2023.00392

Cite

Citations (1)

Analysis of partially observed clustered data using generalized estimating equations and multiple imputation.

PubMed (2014)

Kathryn M. Aloisio Sonja A. Swanson Nadia Micali Alison E. Field Nicholas J. Horton

Clustered data arise in many settings, particularly within the social and biomedical sciences. As an example, multiple-source reports are commonly collected in child and adolescent psychiatric epidemiologic studies where researchers use various informants (e.g. parent and adolescent) to provide a holistic view of a subject's symptomatology. Fitzmaurice et al. (1995) have described estimation of multiple source models using a standard generalized estimating equation (GEE) framework. However, these studies often have missing data due to additional stages of consent and assent required. The usual GEE is unbiased when missingness is Missing Completely at Random (MCAR) in the sense of Little and Rubin (2002). This is a strong assumption that may not be tenable. Other options such as weighted generalized estimating equations (WEEs) are computationally challenging when missingness is non-monotone. Multiple imputation is an attractive method to fit incomplete data models while only requiring the less restrictive Missing at Random (MAR) assumption. Previously estimation of partially observed clustered data was computationally challenging however recent developments in Stata have facilitated their use in practice. We demonstrate how to utilize multiple imputation in conjunction with a GEE to investigate the prevalence of disordered eating symptoms in adolescents reported by parents and adolescents as well as factors associated with concordance and prevalence. The methods are motivated by the Avon Longitudinal Study of Parents and their Children (ALSPAC), a cohort study that enrolled more than 14,000 pregnant mothers in 1991-92 and has followed the health and development of their children at regular intervals. While point estimates were fairly similar to the GEE under MCAR, the MAR model had smaller standard errors, while requiring less stringent assumptions regarding missingness.

Gee

Imputation (statistics)

Estimating equations

Concordance

Longitudinal data

Longitudinal Study

Source

Cite

Citations (81)

Improved Generalized Estimating Equation Analysis via xtqls for Quasi–Least Squares in Stata

The Stata Journal Promoting communications on statistics and Stata (2007)

Justine Shults Sarah J. Ratcliffe Mary B. Leonard

Quasi–least squares (QLS) is an alternative method for estimating the correlation parameters within the framework of the generalized estimating equation (gee) approach for analyzing correlated cross-sectional and longitudinal data. This article summarizes the development of qls that occurred in several reports and describes its use with the user-written program xtqls in Stata. Also, it demonstrates the following advantages of qls: (1) qls allows some correlation structures that have not yet been implemented in the framework of gee, (2) qls can be applied as an alternative to gee if the gee estimate is infeasible, and (3) qls uses the same estimating equation for estimation of β as gee; as a result, qls can involve programs already available for gee. In particular, xtqls calls the Stata program xtgee within an iterative approach that alternates between updating estimates of the correlation parameter α and then using xtgee to solve the gee for β at the current estimate of α. The benefit of this approach is that after xtqls, all the usual postregression estimation commands are readily available to the user.

Gee

Estimating equations

Least-squares function approximation

10.1177/1536867x0700700201

Cite

Citations (51)