The Lung Image Database Consortium (LIDC) Data Collection Process for Nodule Detection and Annotation

2007 
Computed tomography (CT) is being investigated for a variety of radiologic tasks involving lung nodules and lung malignancies. These activities include using low-dose CT as a screening tool for the early detection of lung cancer in high risk populations (1,2), evaluating the response of primary and metastatic lung lesions to various therapies (3) and characterizing indeterminate nodules as benign or malignant (4, 5, 6). Radiologists are faced with the task of both identifying and characterizing lung nodules on large, multidetector row CT scans for these applications. This has motivated interest and research into computer-aided diagnosis (CAD) methods, with several commercial systems having either already received FDA approval or that have been submitted for approval of CAD or CAD-like systems. To further stimulate research and development activities in this area, the National Cancer Institute (NCI) formed the Lung Image Database Consortium – the LIDC (7–9). The mission of the LIDC is: (a) to develop an image database as a web accessible international research resource for the development, training, and evaluation of CAD methods for lung cancer detection and diagnosis using CT and (b) to create this database to enable the correlation of performance of CAD methods for detection and classification of lung nodules with spatial, temporal and pathological ground truth. The intent of this database is to hasten advancement of lung nodule CAD research by (1) providing clinical images to investigators who might not have access to patient images and (2) creating a reference database that will support the relative comparison of different CAD systems performance, thus eliminating database composition as a source of variability in system performance (10). This database requires the collection of an appropriate set of scans, and the creation of “truth” for each scan. The LIDC decided that information about the presence or absence of lung nodules, and the spatial extent of nodules when present, should be provided for each scan in the LIDC database. To obtain the best estimate of spatial truth, expert thoracic radiologists analyzed and annotated each of the collected CT scans. (Note that the LIDC also intends to provide histopathological “truth” for each scan in which this data becomes available). Previous research (11–14) has indicated that there is considerable variability among even expert readers in both the detection and boundary delineation of lung nodules on CT (15). This variability has been observed in many similar tasks, both in determining nodule size through estimating volume or measuring unidimensional or bidimensional lesion size to assess disease progression (16–21). While the issue of inter-reader variability is widely recognized, the typically accepted solution to this problem is to form an expert review panel. However, this usually involves having a number of radiologists (typically an odd number greater than or equal to three) review each scan first independently, and then when there is disagreement, to jointly come together to arrive at a consensus decision. The goal of the LIDC is to annotate several hundred CT scans by thoracic radiologists at geographically separate centers. Therefore, obtaining spatial truth using ongoing consensus panels seemed to be a difficult, if not impossible, task. Consensus panels frequently reflect the opinion of the “strongest” member of the panel, as a recognized weakness of this approach. In addition, the truth panel approach does not capture the variability and uncertainty between readers, which may be of interest to a wide variety of lung nodule studies. Therefore, the LIDC designed a two-phase data collection process that would: (a) allow multiple expert readers to review each scan; (b) unambiguously express the nodule location and spatial extent information acquired from each review in the form of expert annotations; (c) allow for and express differences between readers in the identification of nodules and the variability in the delineation of nodule boundaries; (d) allow the data collection process to be performed asynchronously so that all radiologists need not participate in the review of a single scan at the same instant in time. The purpose of this manuscript is to describe the design and implementation of the two-phase reading approach used by the LIDC in its data collection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    165
    Citations
    NaN
    KQI
    []