Cross-specialty PROMIS-global health differential item functioning
7
Citation
26
Reference
10
Related Paper
Citation Trend
Keywords:
Differential item functioning
Specialty
Global Health
Differential item functioning
Item analysis
Classical test theory
Cite
Citations (0)
Abstract Purpose ReQoL-10 and ReQoL-20 have been developed for use as outcome measures with individuals aged 16 and over, experiencing mental health difficulties. This paper reports modelling results from the item response theory (IRT) analyses that were used for item reduction. Methods From several stages of preparatory work including focus groups and a previous psychometric survey, a pool of items was developed. After confirming that the ReQoL item pool was sufficiently unidimensional for scoring, IRT model parameters were estimated using Samejima’s Graded Response Model (GRM). All 39 mental health items were evaluated with respect to item fit and differential item function regarding age, gender, ethnicity, and diagnosis. Scales were evaluated regarding overall measurement precision and known-groups validity (by care setting type and self-rating of overall mental health). Results The study recruited 4266 participants with a wide range of mental health diagnoses from multiple settings. The IRT parameters demonstrated excellent coverage of the latent construct with the centres of item information functions ranging from − 0.98 to 0.21 and with discrimination slope parameters from 1.4 to 3.6. We identified only two poorly fitting items and no evidence of differential item functioning of concern. Scales showed excellent measurement precision and known-groups validity. Conclusion The results from the IRT analyses confirm the robust structure properties and internal construct validity of the ReQoL instruments. The strong psychometric evidence generated guided item selection for the final versions of the ReQoL measures.
Differential item functioning
Measurement Invariance
Item analysis
Cite
Citations (8)
As one of the significant types of tests, the test project and short test are popular in educational testing. Parameter and non-parameter item response theory being the starts, these tests were under analysis. Compared was the geography paper in inaugurated arts taken by some senior three students. During this comparison the Rasch and Mokken model were respectively selected. For analyzing software Winsteps and Xcalibre were utilized to analyze item parameters in Rasch model. Analyzed in detail were the parameters of difficulty, differential item functioning and information curve. Software MSP was for the purpose of analyzing items in Mokken model. Besides, the statistics of accurate rate and coefficients of homogeneity were also analyzed in detail. Finally, four conclusions were arrived at as the following:( 1 ) The estimate results of difficulty between non-parameter and parameter item response theory were equivalent. ( 2 ) Those items, which failed to fit parameter item response theory, succeeded in non-parameter item response theory. ( 3 ) Non-parameter item response theory is more rigorous than parameter item response theory in dimension testing and item screening. ( 4 ) The result was equivalent in the detection for differential item functioning between non-parameter and parameter item response theory.
Differential item functioning
Classical test theory
Test theory
Polytomous Rasch model
Cite
Citations (0)
This chapter primarily considers the application of item response theory (IRT) methods for validation of instruments, including the use of IRT to identify both the amount of information provided by each item in multi-item scales and the extent of any differential item functioning (DIF). The principal concept in IRT is the item characteristic curve (ICC). Several methods have been developed for DIF analysis, including powerful approaches using logistic regression models or IRT models, the chapter first illustrates a robust nonparametric analysis using a chi-squared test. The logistic models can help to identify items that give problems, but when items do not fit the model it becomes difficult to include them in subsequent IRT analyses. Both IRT and DIF analyses are particularly useful in screening items for inclusion in new questionnaires, and for checking the validity of assumptions even in traditional tests.
Differential item functioning
Classical test theory
Item analysis
Cite
Citations (1)
Methods for detecting differential item func tioning (DIF) have been proposed primarily for the item response theory dichotomous response model. Three measures of DIF for the dichotomous response model are extended to include Samejima's graded response model: two measures based on area differences between item true score functions, and a χ 2 statistic for comparing differences in item parameters. An illustrative example is presented.
Differential item functioning
Statistic
Polytomous Rasch model
Cite
Citations (110)
Simulated data were used to investigate the performance of modified versions of the Mantel-Haenszel method of differential item functioning (DIF) analysis in computerized adaptive tests (CATs). Each simulated examinee received 25 items from a 75-item pool. A three-parameter logistic item response theory (IRT) model was assumed, and examinees were matched on expected true scores based on their CAT responses and estimated item parameters. The CAT-based DIF statistics were found to be highly correlated with DIF statistics based on nonadaptive administration of all 75 pool items and with the true magnitudes of DIF in the simulation. Average DIF statistics and average standard errors also were examined for items with various characteristics. Finally, a study was conducted of the accuracy with which the modified Mantel-Haenszel procedure could identify CAT items with substantial DIF using a classification system now implemented by some testing programs. These additional analyses provided further evidence that the CAT-based DIF procedures performed well. More generally, the results supported the use of IRT-based matching variables in DIF analysis. Index terms: adaptive testing, computerized adaptive testing, differential item functioning, item bias, item response theory.
Differential item functioning
Item analysis
Cite
Citations (55)
Differential item functioning (DIF) has been informally conceptualized as multidimensionality. Recently, more formal descriptions of DIF as multidimensionality have become available in the item response theory literature. This approach assumes that DIF is not a difference in the item parameters of two groups; rather, it is a shift in the distribution of ability along a secondary trait that influences the probability of a correct item response. That is, one group is relatively more able on an ability such as test-wiseness. The parameters of the secondary distribution are confounded with item parameters by unidimensional DIF detection models, and this manifests as differences between estimated item parameters. However, DIF is con founded with impact in multidimensional tests, which may be a serious limitation of unidimen sional detection methods in some situations. In the multidimensional approach, DIF is considered to be a function of the educational histories of the examinees. Thus, a better tool for understanding DIF may be provided through structural modeling with external variables that describe background and schooling experience.
Differential item functioning
Item analysis
Trait
Equating
Cite
Citations (92)
Differential item functioning
Cite
Citations (3)
The aims of this paper are to present findings related to differential item functioning (DIF) in the Patient Reported Outcome Measurement Information System (PROMIS) depression item bank, and to discuss potential threats to the validity of results from studies of DIF. The 32 depression items studied were modified from several widely used instruments. DIF analyses of gender, age and education were performed using a sample of 735 individuals recruited by a survey polling firm. DIF hypotheses were generated by asking content experts to indicate whether or not they expected DIF to be present, and the direction of the DIF with respect to the studied comparison groups. Primary analyses were conducted using the graded item response model (for polytomous, ordered response category data) with likelihood ratio tests of DIF, accompanied by magnitude measures. Sensitivity analyses were performed using other item response models and approaches to DIF detection. Despite some caveats, the items that are recommended for exclusion or for separate calibration were "I felt like crying" and "I had trouble enjoying things that I used to enjoy." The item, "I felt I had no energy," was also flagged as evidencing DIF, and recommended for additional review. On the one hand, false DIF detection (Type 1 error) was controlled to the extent possible by ensuring model fit and purification. On the other hand, power for DIF detection might have been compromised by several factors, including sparse data and small sample sizes. Nonetheless, practical and not just statistical significance should be considered. In this case the overall magnitude and impact of DIF was small for the groups studied, although impact was relatively large for some individuals.
Differential item functioning
Polytomous Rasch model
Item bank
Response bias
Sample (material)
Cite
Citations (93)