What Are You Measuring? Dimensionality and Reliability Analysis of Ability and Speed in Medical School Didactic Examinations.

2016 
Summative didactic evaluation often involves multiple choice questions which are then aggregated into exam scores, course scores, and cumulative grade point averages. To be valid, each of these levels should have some relationship to the topic tested (dimensionality) and be sufficiently reproducible between persons (reliability) to justify student ranking. Evaluation of dimensionality is difficult and is complicated by the classic observation that didactic performance involves a generalized component (g) in addition to subtest specific factors. In this work, 183 students were analyzed over two academic years in 13 courses with 44 exams and 3352 questions for both accuracy and speed. Reliability at all levels was good (>0.95). Assessed by bifactor analysis, g effects dominated most levels resulting in essential unidimensionality. Effect sizes on predicted accuracy and speed due to nesting in exams and courses was small. There was little relationship between person ability and person speed. Thus, the hierarchical grading system appears warrented because of its g-dependence.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []