Abstract This article develops a survey design where the questionnaire is split into components and individuals are administered the varying subsets of the components. A multiple imputation method for analyzing data from this design is developed, in which the imputations are created by random draws from the posterior predictive distribution of the missing parts, given the observed parts by using Gibbs sampling under a general location scale model. Results from two simulation studies that investigate the properties of the inferences using this design are reported. In the first study several random split questionnaire designs are imposed on the complete data from an existing survey collected using a long questionnaire, and the corresponding data elements are extracted to form split data sets. Inferences obtained using the complete data and the split data are then compared. This comparison suggests that little is lost, at least in the example considered, by administering only parts of the questionnaire to each sampled individual. The second simulation study reports on the investigation of the efficiency of the split questionnaire design and the robustness of the estimates to the distributional assumptions used to create imputations. In this study several complete and split data sets were generated under a variety of distributional assumptions, and the imputations for the split data sets were created assuming the normality of the distributions. The sampling properties of the point and interval estimates of the regression coefficient in a particular logistic regression model using both the complete and split data sets were compared. This comparison suggests that the loss in efficiency of the split questionnaire design decreases as the correlation among the variables that are within different parts increases. The proposed multiple imputation method seems to be sensitive to the skewness and relatively insensitive to the kurtosis, contrary to the assumed normality of the distribution for the observables. Key Words: Gibbs samplingMultiple imputationNonresponseResponder burden
Tests of Linear Hypotheses When the Data Are Proportions James E. Grizzle, Assistant Professor of BiostatisticsPh.D. CopyRight https://doi.org/10.2105/AJPH.53.6.970 Published Online: August 29, 2011
Abstract Inference procedures based on some simple rank statistics are proposed and studied for the statistical analysis of longitudinal data. These robust and asymptotically efficient procedures do not require the basic assumption of multivariate normality of the underlying distributions. The theory is illustrated with two examples.
CARET is a two-armed, double-blind, randomized chemo-prevention trial to test the hypothesis that oral administration of beta-carotene 30 mg/day plus retinyl palmitate 25,000 IU/day will decrease the incidence of lung cancer in high-risk populations: heavy smokers and asbestos-exposed workers who have smoked. The agents combine anti-oxidant and nuclear tumor suppressor mechanisms. Fastidious monitoring for possible side effects is facilitated by inclusion of a Vanguard population. As of 31 December 1990, 6,105 participants of the 18,000 needed have been randomized in the trial. Efficacy results are expected in 1999.
Our report summarizes and compares the characteristics of six prospective, multicenter, randomized clinical trials of carotid endarterectomy underway in North America and Europe. Three trials are designed to evaluate the safety and efficacy of endarterectomy in patients with asymptomatic carotid artery stenosis. The other three trials enroll patients who have had transient ischemic attacks or a minor cerebral infarction in the distribution of the randomized artery. Considered together, these six clinical trials span the range of candidates for carotid endarterectomy. The inclusion and exclusion criteria, methodology, and statistical considerations of each study are detailed in tables. The results from these trials will be helpful in resolving some of the questions surrounding endarterectomy, provided the similarities and differences in the study designs are considered when interpreting the results.
We investigate by simulation the properties of four different estimation procedures under a linear model for correlated data with Gaussian error: maximum likelihood based on the normal mixed linear model; generalized estimating equations; a four-stage method, and a bootstrap method that resamples clusters rather than individuals. We pay special attention to the group randomized trials where the number of independent clusters is small, cluster sizes are big, and the correlation within the cluster is weak. We show that for balanced and near balanced data when the number of independent clusters is small (⩽10), the bootstrap is superior if analysts do not want to impose strong distribution and covariance structure assumptions. Otherwise, ML and four-stage methods are slightly better. All four methods perform well when the number of independent clusters reaches 50.