Advances in Rasch Modeling: New Applications and Directions: Guest Editorial

2015 
In 1960 Georg Rasch helped open the field of Item Response Theory by the model that bears his name, distinguished by the use of a single parameter to model the relationship between item difficulty and person ability. Various extensions of this relatively simple model have been proposed since then and are regularly applied in assessments. By including additional parameters in order, for example, to model variation in item discriminations (2-PL) or variation in guessing probabilities (3-PL) (Birnbaum, 1968), these extensions model the observed data more exactly and in principle improve the fit to the data of the response probabilities used to calculate test scores. However, the gain in model fit (and arguably reliability for particular item types) has a cost: not only are these models more complex but the resulting test scores are also more difficult to interpret.In the U.S., various stakeholders including courts and states have adopted the Rasch model, in part, because it leads to logical and transparent results. With the Rasch model all items are weighted equally in order to define the ability that is to be measured, whereas in the 2-PL- and 3-PL-model the weighting of the items is defined recursively within the estimation process. Wright and Masters (1981) distinguish the Rasch approach as one that meets the requirements of measurement science: fitting the data to the model is a principled method for testing the hypothesis that the variable is, in fact, stable and meaningfully structured. This point can be stated another way: when one tries to fit the model to the data, we lose the property of "specific objectivity" (Rasch, 1977). Wilson (2005) reminds us that the Rasch model framework is fundamentally important to the sorts of interpretations one can make in measurement science. When choosing and evaluating a measurement model, we should think of the geographic map: The idea of "location" of an item response with respect to the location of another item response only makes sense if that relative meaning is independent of the location of the respondent involved - i.e., the interpretation of relative locations needs to be uniform no matter where the respondent is. This invariance requirement corresponds to the idea that an "inch represents a mile" or a "meter represents a kilometer" wherever you are on a geographical map. The Rasch model upholds these principles of measurement science and meets more squarely with common sense notions of fairness and order.While the Rasch model requires that all items be equally discriminating in order to define the ability that is to be measured, the 2-PL- and 3-PL-models allow discrimination to vary across items and calculates it recursively as part of the estimation process. However, this also impacts the estimation of the item difficulty parameter, creating a sharp discontinuity between how Rasch and 2-PL/3-PL item difficulties can be interpreted. Because item discrimination is at least in part a property of how a particular sample of examinees interacts with an item and is not exclusively a property of the item, and because examinees vary across tests, the inescapable consequence is that the scores calculated using the 2-PL and 3-PL models are not guaranteed to be as generalizable across tests as scores calculated from data that is constrained to fit the requirements of the Rasch model, especially when the scores are based on the highly sample-dependent 3-PL model. This is in part the reason why the Rasch model is still applied and remains an indispensable tool in the psychometrician's toolkit. In high-stakes tests such as licensure exams, judges and policymakers work to ensure that all items employed to make a consequential decision embody the same construct, i.e., that all items are equally discriminating. These stakeholders rely on the measurement scientists' efforts to remove items that discriminate along different and possibly unknown dimensions. The convenience and parsimony offered by the less exacting 2-PL and 3-PL models for test construction is a high price to pay for losing the ability to claim comparability of scores across tests or to know where one is on the variable map and the distance from one location to another. …
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    1
    Citations
    NaN
    KQI
    []