On the consequences of misspecifing assumptions concerning residuals distribution in a repeated measures and nonlinear mixed modelling context

Rachid El Halimi,Jordi Ocaña Rebull

On the consequences of misspecifing assumptions concerning residuals distribution in a repeated measures and nonlinear mixed modelling context

2004

In this paper we describe the results of a simulation study performed to elucidate the robustness of the Lindstrom and Bates (1990) approximation method under non-normality of the residuals, under different situations. Concerning the fixed effects, the observed coverage probabilities and the true bias and mean square error values, show that some aspects of this inferential approach are not completely reliable. When the true distribution of the residuals is asymmetrical, the true coverage is markedly lower than the nominal one. The best results are obtained for the skew normal distribution, and not for the normal distribution. On the other hand, the results are partially reversed concerning the random effects. Soybean genotypes data are used to illustrate the methods and to motivate the simulation scenarios. 1. Motivation and Introduction The nonlinear mixed effects model is used to represent data in pharmacokinetics (Davidian and Giltinan, 1995), breast cancer dynamics (El Halimi et al., 2003), growth curves in Soybean genotypes data (Pinheiro and Bates, 2000) and other areas, where the within-individual model is a function of individual-scientific, scientifically meaningful parameters. It is well known that maximum likelihood estimation for nonlinear mixed effects models leads to a cumbersome integration problem, because random parameters appear inside the nonlinear expectation 1 function. To avoid this problem, several approximation have been proposed. The parametric approach to non-linear mixed-effects modeling using the LB-method (Lindstrom and Bates, 1990) is, essentially, based on the standard assumption of normality of the errors and random effects. But these assumptions may not always be realistic or, in any case, difficult to verify as they are not directly observed. In this paper we investigate the impact of the non-normality conditions on estimating fixed and random components parameters, via a Monte-Carlo simulation study by considering the Soybean genotypes model reported in Davidian and Giltinan (1995) and analyzed by Pinheiro and Bates (2000). Typical profiles are displayed in Figure 1, where the response of leaf weight are plotted by subject. The goal of the study was to compare the growth patterns of two soybean genotypes, a commercial variety, Forrest (F) and an experimental strain, Plan Introduction #416937 (P). Data were collected during three years, from 1988 to 1990. At the beginning of the growing season in each year, 16 plots were planted with seeds; 8 plots with each genotype. Each plot was sampled eight to ten times at approximately weekly intervals. At each sampling time, six plants were randomly selected from each plot, leaves from this plant were weighted, and the average leaf weight per plant (in g) was calculated for each plot. Different plots in different sites were used in different years. The logistic model derived from Pinheiro and Bates (2000) is an appropriate characterization of leaf weight response, where the parameter may vary across subjects. 1 2 3 1 1 1 2 2 2 3 3 3 1 exp ( )/ . i ij ij ij i i i i i i i i y e t δ δ δ δ δ η δ δ η δ δ η = +   + −      = +  = +  = +  (1.1) where represents the average leaf weight/plant in subject , , at time t . The random effects are (0, D) and the e ij y i 1, , 48 i = ij ' 1 2 3 ( , , ) i i i η η η η = ij are (0, σ ) and are independent of the . The association of the fixed effects δ with the random effects vector is represented by the i η 2 linear function above, where the subject-specific parameters δ are independent across i . But, as some of the analyses presented in El Halimi et al. (2003) and the profile of qq-norm of residuals (under homogeneity assumption) displayed in Figure 2 suggest, this assumption may not always be realistic. These violations of the assumptions of the model pose questions on the validity of the inferences made during the modeling process. As a first approach to answering these questions, we performed a simulation study emulating the conditions of the soybean genotypes studies described above. i 2. Simulation study on the distributional assumptions We carried out several simulation studies in which data were generated according to the soybean genotypes model given in equation (1.1), with known “population” or “true” parameter values. For fixed effects, these values were taken as δ = (19.26, 55, 8.4)’. For random effects, the covariance matrix was 25 2.50 4.00 8.00 2.32 . 2.00 D       =           The random effects were generated according to an expression equivalent to , where h stands for a standardised version of the vector of random effects (common for all i) , generated from a normal distribution with zero mean and unit variance, and L stands for the lower triangular matrix resulting from the Cholesky decomposition of a covariance matrix D. These values were chosen near to the estimated values given by the splus 2000 implementation of LBmethod (nlme function) and according to the maximum likelihood variant of the estimation procedure. The residuals or errors were generated in similar way, first as i.i.d standardised values and subsequently converted to values with standard deviation σ (in this particular cases σ=1) and according to the following marginal distributions: i Lh η = NNormal distribution, which represent the case where the usual assumption of normality on the errors is valid.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations