logo
    B-57Sensitivity and Inter-rater Reliability of the Behavior Rating Inventory of Executive Function-Preschool Version in Children with Epilepsy
    1
    Citation
    0
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Objective: Executive dysfunction is common in children with epilepsy.Despite earlier studies demonstrating that the Behavior Rating Inventory of Executive Function (BRIEF) is a clinically useful instrument for detecting executive deficits in a school-age epilepsy population, little is available for younger children.The purpose of this study is to evaluate the sensitivity of the preschool age version of this instrument (BRIEF-P) in young children with epilepsy and to examine its inter-rater reliability.Method: The parents of 22 clinically referred children with epilepsy (Age: M ¼ 4.05, SD¼.95, Range ¼ 2 -5; IQ: M ¼ 83.31, SD¼ 25.65, Range: ¼ ,40-129) completed the BRIEF-P as part of a more comprehensive neuropsychological evaluation.For a smaller subset (n ¼ 12), teachers also submitted BRIEF-P forms.Using a cutoff t-score of ≥65 as the threshold for impairment, sensitivity of the BRIEF-P variables was established.Intra-class correlation coefficients (ICCs) assessed inter-rater reliability (IRR) of the parent and teacher forms.Results: At the parent scale level, emergent metacognition (EMI) (Parent ¼ 59%, Teacher ¼ 42%) and global executive composite (GEC) (Parent ¼ 41%, Teacher ¼ 42%) were frequently elevated.The most commonly elevated subscales were inhibition (Parent ¼ 36%, Teacher ¼ 60%), working memory (Parent ¼ 63%, Teacher ¼ 75%) and planning/organization (Parent ¼ 41%, Teacher ¼ 33%).With the exception of the emotional control subscale on the BRIEF-P, all other indices demonstrated moderate to excellent IRR, ranging from an ICC of .532 to .918.Conclusion: This study provides preliminary support for the BRIEF-P in preschool aged children with epilepsy; both the parent-report form and teacher-report form show sensitivity to executive dysfunction in these children.Furthermore, the BRIEF-P appears to have strong inter-rater reliability.
    Keywords:
    Inter-Rater Reliability
    Evaluating the extent of cerebral ischemic infarction is essential for treatment decisions and assessment of possible complications in patients with acute ischemic stroke. Patients are often triaged according to image-based early signs of infarction, defined by Alberta Stroke Program Early CT Score (ASPECTS). Our aim was to evaluate interrater reliability in a large group of readers.We retrospectively analyzed 100 investigators who independently evaluated 20 non-contrast computed tomography (NCCT) scans as part of their qualification program for the TENSION study. Test cases were chosen by four neuroradiologists who had previously scored NCCT scans with ASPECTS between 0 and 8 and high interrater agreement. Percent and interrater agreements were calculated for total ASPECTS, as well as for each ASPECTS region.Percent agreements for ASPECTS ratings was 28%, with interrater agreement of 0.13 (95% confidence interval, CI 0.09-0.16), at zero tolerance allowance and 66%, with interrater agreement of 0.32 (95% CI: 0.21-0.44), at tolerance allowance set by TENSION inclusion criteria. ASPECTS region with highest level of agreement was the insular cortex (percent agreement = 96%, interrater agreement = 0.96 (95% CI: 0.94-0.97)) and with lowest level of agreement the M3 region (percent agreement = 68%, interrater agreement = 0.39 [95% CI: 0.17-0.61]).Interrater agreement reliability for total ASPECTS and study enrollment was relatively low but seems sufficient for practical application. Individual region analysis suggests that some are particularly difficult to evaluate, with varying levels of reliability. Potential impairment of the supraganglionic region must be examined carefully, particularly with respect to the decision whether or not to perform mechanical thrombectomy.
    Inter-Rater Reliability
    Allowance (engineering)
    Stroke
    Citations (52)
    The poor reliability of the DSM diagnostic system has been a major issue of concern for many researchers and clinicians. Standardized interview techniques and rating scales have been shown to be effective in increasing interrater reliability in diagnosis and classification. This study hypothesized that the utilization of the Psychological Rating Scale for Diagnostic Classification for assessing the problematic behaviors, symptoms, or other characteristics of an individual would increase interrater reliability, subsequently leading to higher diagnostic agreement between raters and with DSM-III classification. This hypothesis was strongly supported by high overall profile reliability and individual profile reliability. Therefore utilization of this rating scale would enhance the accuracy of diagnosis and add to the educational efforts of technical personnel and those professionals in related disciplines.
    Inter-Rater Reliability
    Citations (0)
    This study compared four multi-item indices of interrater agreement: (a) intraclass correlation coefficient, (b) within-group interrater agreement, (c) modified within-group interrater agreement, and (d) average deviation index. Findings included (a) that the different indices of interrater agreement provided different information about the agreement across raters, (b) standards for acceptable agreement are inconsistent for the several indices, (c) the average deviation index showed most groups had acceptable agreement, and (d) removing data from analyses based on unacceptable interrater agreement values does little in the way of affecting overall outcomes. Implications include that interrater agreement indices are not interchangeable and that for research purposes the concern about interrater agreement may be overstated.
    Inter-Rater Reliability
    Agreement
    Background: There is limited information about the agreement and reliability of clinical shoulder tests. Objectives: To assess the interrater agreement and reliability of clinical shoulder tests in patients with shoulder pain treated in primary care. Methods: Patients with a primary report of shoulder pain underwent a set of 21 clinical shoulder tests twice on the same day, by pairs of independent physical therapists. The outcome parameters were observed and specific interrater agreement for positive and negative scores, and interrater reliability (Cohen's kappa (κ)). Positive and negative interrater agreement values of ≥0.75 were regarded as sufficient for clinical use. For Cohen's κ, the following classification was used: <0.20 poor, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good, 0.81–1.00 very good reliability. Participating clinics were randomized in two groups; with or without a brief practical session on how to conduct the tests. Results: A total of 113 patients were assessed in 12 physical therapy practices by 36 physical therapists. Positive and negative interrater agreement values were both sufficient for 1 test (the Full Can Test), neither sufficient for 5 tests, and only sufficient for either positive or negative agreement for 15 tests. Interrater reliability was fair for 11 tests, moderate for 9 tests, and good for 1 test (the Full Can Test). An additional brief practical session did not result in better agreement or reliability. Conclusion: Clinicians should be aware that interrater agreement and reliability for most shoulder tests is questionable and their value in clinical practice limited.
    Inter-Rater Reliability
    Kappa
    ABSTRACT In order to apply the Comprehensive Psychopathological Rating Scale (CPRS) to psychiatric clinical practice in Japan we have made a preliminary Japanese version and examined the doctor to doctor and doctor to nurse inter‐rater reliability. Compared with the high reliability of the items of reported psychopathology, the coefficients of correlation of observed psy‐chopathology were considerably scattered. Education and training of the rater may improve the reliability. It is considered that the CPRS will become a very useful psychiatric rating scale in Japan.
    Inter-Rater Reliability
    The aims of this research were (a) to study the interrater reliability of a posture observation method, (b) to test the impact of different posture categorization systems on interrater reliability, and (c) to provide guidelines for improving interrater reliability.Estimation of posture through observation is challenging. Previous studies have shown varying degrees of validity and reliability, providing little information about conditions necessary to achieve acceptable reliability.Seven raters estimated posture angles from video recordings. Different measures of interrater reliability, including percentage agreement, precision, expression as interrater standard deviation, and intraclass correlation coefficients (ICC), were computed.Some posture parameters, such as the upper arm flexion and extension, had ICCs > or = 0.50. Most posture parameters had a precision around the 10 degrees range. The predefined categorization and 300 posture categorization strategies showed substantially better agreement among the raters than did the 10 degrees strategy.Different interrater reliability measures described different aspects of agreement for the posture observation tool. The level of agreement differed substantially between the agreement measures used. Observation of large body parts generally resulted in better reliability. Wider width angle intervals resulted in better percentage agreement compared with narrower intervals. For most postures, 30 degrees-angle intervals are appropriate. Training aimed at using a properly designed data entry system, and clear posture definitions with relevant examples, including definitions of the neutral positions of the various body parts, will help improve interrater reliability.The results provide ergonomics practitioners with information about the interrater reliability ofa postural observation method and guidelines for improving interrater reliability for video-recorded field data.
    Inter-Rater Reliability
    Intra-rater reliability
    Citations (83)
    Because of the increasing popularity and use of performance testing and rating scales in ESL writing, many scholars have stressed the need to pay close attention to the raters themselves. Hence, a number of studies have been explored to determine the impact of rater characteristics, rating method, and rating patterns on interrater agreement. There is, however, little research that attempted to analyze interrater variability in a more micro-level and to compare raters’ specific comments to determine whether they arrive at the same rating for the same reasons. Specifically, this paper sought to determine: (1) the level of agreement among raters and the factors that might have influenced the results, (2) the similarities of raters’ reasons for arriving at the same rating, and (3) the rater-related factors that might account for their differences in assessing ESL learners’ written production. Three groups of experienced raters rated 39 essays and provided reasons for arriving at such ratings. Using a mixed method approach, findings revealed that raters posted a fair interrater agreement and that they had different reasons for arriving at the same rating. Such findings were mainly attributed to the raters’ different scoring focus and rating scale. This study has implications for ESL writing assessment practices and for future studies.
    Inter-Rater Reliability
    Popularity
    Agreement
    Citations (0)
    Objective: The aim of this study was to analyze influences on interrater reliability and within-group agreement within a highly experienced rater group when assessing pilots’ nontechnical skills.Background: Nontechnical skills of pilots are crucial for the conduct of safe flight operations. To train and assess these skills, reliable expert ratings are required. Literature shows to some degree that interrater reliability is influenced by factors related to the targets, scenarios, rating tools, or the raters themselves.Method: Thirty-seven type-rating examiners from a European airline assessed the performance of 4 flight crews based on video recordings using LOSA and adapted NOTECHS tools. We calculated rwg and ICC(3) to measure within-group agreement and interrater reliability.Results: The findings indicated that within-group agreement and interrater reliability were not always acceptable. It was shown that the performance of outstanding pilots was rated with the highest within-group agreement. For cognitive aspects of performance, interrater reliability was higher than for social aspects of performance. Agreement was lower on the pass–fail level than for the distinguished performance scales.Conclusion: These results suggest pass–fail decisions should not be based exclusively on nontechnical skill ratings. We furthermore recommend that regulatory authorities more systematically address interrater reliability in airline instructor training. Airlines as well as training facilities should be encouraged to demonstrate sufficient interrater reliability when using their rating tools.
    Inter-Rater Reliability