[Performance assessment of mammographic diagnostic systems: evolution of methods and their application to a digital image study].

1999 
INTRODUCTION: Receiver Operating Characteristic" (ROC) curves are one of the most efficient analysis tools for the complete evaluation of a diagnostic system performance. However this method is limited in visualizing and locating abnormal structures, such as clusters of microcalcifications on mammographic images. Other more refined and complex techniques have also been suggested, where particular statistical hypotheses are assumed, namely the "Free-response ROC" (FROC), the "Alternative FROC" (AFROC) and the "Free-response Forced Error" (FFE) analyses. We studied the theoretical bases of these different methods and their experimental applications to assess the correctness of the hypothetical statistical distributions. MATERIAL AND METHODS: We considered two statistical hypotheses: first, that the false-positive response distribution follows the Poissonian statistics; second, that "signal" and "noise" distributions have a Gaussian trend with different means and variances. Thus, we applied the different methods to the responses given by 8 observers (5 radiologists and 3 medical physicists) who independently evaluated 3 digital mammographic samples. Every sample consisted of 39 images, with 1-15 clusters each (total: 100 clusters). The samples were obtained from 39 images available in an Internet database (sample 1); 2 different digital filters were applied to each image (samples 2 and 3). To collects responses, we provided for two phases: first, every observer visualized and located the clusters at a given confidence level; second, when a false-positive response was given, spontaneously or after forcing, the responses were ordered by decreasing conspicuity. Finally, data were analyzed with a "home-made" software by applying the FROC and AFROC analyses to the data collected in phase 1 and the FFE analysis to those collected in phase 2. RESULTS: We considered the area under the AFROC curve as the most important parameter: the values obtained with the 3 types of analysis are well in agreement within their uncertainties. In particular, the FROC-AFROC agreement did not exceed 5.9% (10 of 14 cases within 2.5%), while the FFE analysis had higher standard deviations associated with the area value (about 10%). The interpolated curves from both FROC and AFROC data were very similar. The three methods had various advantages: the FFE is very simple to calculate and makes the most of the information given by the observer; FROC and AFROC can provide true-positive and false-positive responses on the same image, which permits to optimize the evaluation of a diagnostic system performance. The statistical tools used in the simplest methods are usually integrated with the completeness characteristics of the location of multiple signals on mammograms. CONCLUSIONS: In theory, every method is necessary because it provides additional information to validate the statistical hypotheses under investigation. In fact, when the methods are used to evaluate and compare several diagnostic systems, the results of the three techniques are equivalent. Therefore, choosing a specific technique depends on both available resources and response type all the hypothetical statistical distributions in our study proved correct.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []