On Average: Data Exploration Based on Means Can Be Misleading

2012 
INTRODUCTIONOn the basis that “a picture is worth a thousand words,”data are often presented graphically in order to convey someof the information that they contain. Modelers use graphicalrepresentations of data for numerous purposes, two of themost important being, to inform and drive the modelingprocess and to communicate with their clients who are theend users of their models. However, it is not always clearwhich graphic should be used for a particular set of data andfor a particular purpose. As such, it is very well possible thatchoosing the wrong graph may mislead rather than enlighten,which is obviously a situation to be avoided if at all possible.Part of the difficulty lies in the fact that exploratory dataanalysis (EDA), which incorporates graphical exploration ofthe data, is perceived to be as much an art as it is a science.The artistic aspect of EDA is based on human creativity andintuition, and it is our intuition that can sometimes fail on us.One of the most intuitive graphs is one in which the dataare averaged in some way and the averages plotted. This useof averages is so intuitive that it is frequently used withoutvery much thought or consideration. It must be acknowledgedthat in many cases, this proves to be a good strategy thoughthere are occasions when it can be misleading. This paperexamines some commonly encountered situations in whichsuch graphics may be misleading. The reasons why they aremisleading are explored and explained. In addition, analternative strategy is suggested.In order to maintain the confidence of the end users in amodel, it is important that any apparent contradictionbetween the model and the graphical presentation of thedata is carefully explained. It is hoped that the followingsections will go some way in helping with that explanation.For the purposes of the following discussion, it will beassumed that the response variable (Y) is recorded on acontinuous scale. The independent variable (x)maybecontinuous or discrete.The next section describes a general approach tobuilding a mixed effects model. The following two sectionsdiscuss the use of data averages and their limitations. Analternative to data averaging is introduced in the followingsection. The data averaging and alternative methods arecompared by means of a simulated trial and a real caseexample. The paper finishes with a discussion.MIXED EFFECTS MODELWhen the data are grouped in such a way thatobservations within a group are correlated, the model needsto take account of such correlation. Longitudinal data are, ofcourse, grouped and correlated in this way because repeatedobservations on the same subject (experimental unit) arecorrelated. This correlation is due to the fact that suchobservations reflect the individual characteristics of thesubject. There are several options available for modelingcorrelated data (1), one of which is the use of a mixed effectsmodel incorporating both random and fixed effects. The useof mixed effects models is limited to situations where thecorrelation between observations within a group is positive,which is the case for many datasets. These mixed effectsmodels are widely used in pharmacometrics and will form thebasis of our discussion.Consider a situation where data were collected from nsubjects with m
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    2
    Citations
    NaN
    KQI
    []