logo
    Spine Instability Neoplastic Score: agreement across different medical and surgical specialties
    57
    Citation
    41
    Reference
    10
    Related Paper
    Citation Trend
    Items such as physical exam findings, radiographic interpretations, or other diagnostic tests often rely on some degree of subjective interpretation by observers. Studies that measure the agreement between two or more observers should include a statistic that takes into account the fact that observers will sometimes agree or disagree simply by chance. The kappa statistic (or kappa coefficient) is the most commonly used statistic for this purpose. A kappa of 1 indicates perfect agreement, whereas a kappa of 0 indicates agreement equivalent to chance. A limitation of kappa is that it is affected by the prevalence of the finding under observation. Methods to overcome this limitation have been described.
    Cohen's kappa
    Kappa
    Statistic
    Agreement
    Citations (6,707)
    Summary The kappa coefficient is a widely used measure of agreement between observers’ independent recording of diagnoses. Kappa adjusts the overall agreement for expected chance agreement. The dependence of kappa on the prevalence . of a diagnosis has not previously been emphasized. This dependence means that kappa does not give a general statement of the reproducibility of a diagnosis. The result of a study of observer agreement should, therefore, not – as it has been done in several studies – be given by the kappa value alone. The kappa value should always be given together with the original results of the study.
    Kappa
    Cohen's kappa
    Value (mathematics)
    Statement (logic)
    Citations (68)
    Brilliant, L. B. (School of Public Health, U. of Michigan, Ann Arbor, Ml 48109), J. M. Lepkowski and D. C. Musch. Reliability of ophthalmic diagnoses in an epidemiologic survey. Am J Epidemiol 1983; 118: 265–79. In the Nepal Blindness Survey, 39, 887 people in 105 sites were examined by 10 ophthalmologists from Nepal and four other countries during 1981. Ophthalmic protocols were pretested on approximately 3000 subjects; however, interobserver variability was inevitable. To quantify the amount of variability and assess the reliability of important ophthalmic measures, a study of interobserver agreement was conducted. Five ophthalmologists, randomly assigned to one of two examining stations in a single survey site, carried out 529 pairs of examinations. Eighty demographic and ophthalmic variables were assessed at each station. In 62 of 80 (77.5%) measures, observer agreement exceeded 90%. Since pathologic findings were rare, however, chance agreement alone could yield misleadingly high per cent agreement; therefore, the kappa statistic was used for assessing comparative reliability of ophthalmic measures. There were 74 measures for which kappa could be computed and ranked by strength of agreement: 20 (27%) showed excellent agreement (R = 0.75–1.00), 39 (53%) showed fair to good agreement (R = 0.40–0.74), and 15 (20%) showed poor agreement (R < 0.40). In general, measures dealing with blindness prevalence or causes of blindness showed substantial or almost perfect agreement, while polychotomous descriptions of rare clinical signs demonstrated less agreement.
    Kappa
    Cohen's kappa
    Inter-Rater Reliability
    Statistic
    The kappa coefficient is a widely used statistic for measuring the degree of reliability between raters. SAS ® procedures and macros exist for calculating kappa with two or more raters, but none address situations when the kappa coefficient alone does not sufficiently describe the level of reliability. When the prevalence of a rating in the population is very high or low, the value of kappa may indicate poor reliability even with a high observed proportion of agreement. Researchers have recommended reporting several other values in addition to the kappa to address this and another paradox of the kappa statistic. This program, developed in SAS ® 9.1, calculates kappa, but also outputs the observed and expected proportions of agreement, the prevalence and bias indices, and the prevalence adjusted bias adjusted kappa (PABAK) for two raters. Designed for input of the rater responses in the familiar 2x2 table format using the SAS %WINDOW statement, users with minimal SAS experience will be able to report these statistics to more fully characterize the extent of inter-observer agreement between two raters.
    Cohen's kappa
    Kappa
    Inter-Rater Reliability
    Statistic
    Citations (60)
    The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.
    Inter-Rater Reliability
    Kappa
    Cohen's kappa
    Statistic
    Citations (13,543)
    The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.
    Inter-Rater Reliability
    Kappa
    Cohen's kappa
    Statistic
    Citations (8,130)
    Purpose : This paper introduces readers to the problem of measuring interrater agreement in observer variation studies. The most usual statistic to quote is the kappa coefficient which measures agreement having corrected for chance. Method : The kappa coefficient for measuring agreement between two observers is introduced. Some pointers are given to determining sample size estimation. Results : Some properties of the kappa coefficient are illustrated by taking examples from the author's teaching experiences. Conclusion : The kappa coefficient is recommended for measuring agreement in observer variation studies.
    Cohen's kappa
    Kappa
    Inter-Rater Reliability
    Coefficient of variation
    Statistic
    Variation (astronomy)
    Observer (physics)
    Citations (154)
    In M/M Medicine, it has become increasingly important that diagnostic tests are reproducible. The kappa statistic is the measure most frequently used to define the interobserver agreement of diagnostic procedures. The main disadvantage of the kappa statistic is its dependence on the prevalence, making a good kappa value at the end of every reproducibility study always unpredictable. A previous published theoretical protocol proposed solving this problem by obtaining a prevalence near 0.50. This was evaluated in the present study of the passive hip flexion test. A prevalence of 0.44 was found with a good to excellent kappa value of 0.75. It is concluded that when implementing the proposed method in the protocol format for reproducibility studies, using kappa statistics, a prevalence P near 0.50 can easily be obtained avoiding unexpected low kappa values.
    Kappa
    Cohen's kappa
    Statistic