Abdominal multi-organ segmentation of computed tomography (CT) images has been the subject of extensive research interest. It presents a substantial challenge in medical image processing, as the shape and distribution of abdominal organs can vary greatly among the population and within an individual over time. While continuous integration of novel datasets into the training set provides potential for better segmentation performance, collection of data at scale is not only costly, but also impractical in some contexts. Moreover, it remains unclear what marginal value additional data have to offer. Herein, we propose a single-pass active learning method through human quality assurance (QA). We built on a pre-trained 3D U-Net model for abdominal multi-organ segmentation and augmented the dataset either with outlier data (e.g., exemplars for which the baseline algorithm failed) or inliers (e.g., exemplars for which the baseline algorithm worked). The new models were trained using the augmented datasets with 5-fold cross-validation (for outlier data) and withheld outlier samples (for inlier data). Manual labeling of outliers increased Dice scores with outliers by 0.130, compared to an increase of 0.067 with inliers (p<0.001, two-tailed paired t-test). By adding 5 to 37 inliers or outliers to training, we find that the marginal value of adding outliers is higher than that of adding inliers. In summary, improvement on single-organ performance was obtained without diminishing multi-organ performance or significantly increasing training time. Hence, identification and correction of baseline failure cases present an effective and efficient method of selecting training data to improve algorithm performance.
Background: Levels of plasma SARS-CoV-2 nucleocapsid (N) antigen may be an important biomarker in patients with COVID-19 and enhance our understanding of the pathogenesis of COVID-19. Objective: To evaluate whether levels of plasma antigen can predict short-term clinical outcomes and identify clinical and viral factors associated with plasma antigen levels in hospitalized patients with SARS-CoV-2. Design: Cross-sectional study of baseline plasma antigen level from 2540 participants enrolled in the TICO (Therapeutics for Inpatients With COVID-19) platform trial from August 2020 to November 2021, with additional data on day 5 outcome and time to discharge. Setting: 114 centers in 10 countries. Participants: Adults hospitalized for acute SARS-CoV-2 infection with 12 days or less of symptoms. Measurements: Baseline plasma viral N antigen level was measured at a central laboratory. Delta variant status was determined from baseline nasal swabs using reverse transcriptase polymerase chain reaction. Associations between baseline patient characteristics and viral factors and baseline plasma antigen levels were assessed using both unadjusted and multivariable modeling. Association between elevated baseline antigen level of 1000 ng/L or greater and outcomes, including worsening of ordinal pulmonary scale at day 5 and time to hospital discharge, were evaluated using logistic regression and Fine–Gray regression models, respectively. Results: Plasma antigen was below the level of quantification in 5% of participants at enrollment, and 1000 ng/L or greater in 57%. Baseline pulmonary severity of illness was strongly associated with plasma antigen level, with mean plasma antigen level 3.10-fold higher among those requiring noninvasive ventilation or high-flow nasal cannula compared with room air (95% CI, 2.22 to 4.34). Plasma antigen level was higher in those who lacked antispike antibodies (6.42 fold; CI, 5.37 to 7.66) and in those with the Delta variant (1.73 fold; CI, 1.41 to 2.13). Additional factors associated with higher baseline antigen level included male sex, shorter time since hospital admission, decreased days of remdesivir, and renal impairment. In contrast, race, ethnicity, body mass index, and immunocompromising conditions were not associated with plasma antigen levels. Plasma antigen level of 1000 ng/L or greater was associated with a markedly higher odds of worsened pulmonary status at day 5 (odds ratio, 5.06 [CI, 3.41 to 7.50]) and longer time to hospital discharge (median, 7 vs. 4 days; subhazard ratio, 0.51 [CI, 0.45 to 0.57]), with subhazard ratios similar across all levels of baseline pulmonary severity. Limitations: Plasma samples were drawn at enrollment, not hospital presentation. No point-of-care test to measure plasma antigen is currently available. Conclusion: Elevated plasma antigen is highly associated with both severity of pulmonary illness and clinically important patient outcomes. Multiple clinical and viral factors are associated with plasma antigen level at presentation. These data support a potential role of ongoing viral replication in the pathogenesis of SARS-CoV-2 in hospitalized patients. Primary Funding Source: U.S. government Operation Warp Speed and National Institute of Allergy and Infectious Diseases.
Human in-the-loop quality assurance (QA) is typically performed after medical image segmentation to ensure that the systems are performing as intended, as well as identifying and excluding outliers. By performing QA on large-scale, previously unlabeled testing data, categorical QA scores can be generatedIn this paper, we propose a semi-supervised multi-organ segmentation deep neural network consisting of a traditional segmentation model generator and a QA involved discriminator. A large-scale dataset of 2027 volumes are used to train the generator, whose 2-D montage images and segmentation mask with QA scores are used to train the discriminator. To generate the QA scores, the 2-D montage images were reviewed manually and coded 0 (success), 1 (errors consistent with published performance), and 2 (gross failure). Then, the ResNet-18 network was trained with 1623 montage images in equal distribution of all three code labels and achieved an accuracy 94% for classification predictions with 404 montage images withheld for the test cohort. To assess the performance of using the QA supervision, the discriminator was used as a loss function in a multi-organ segmentation pipeline. The inclusion of QA-loss function boosted performance on the unlabeled test dataset from 714 patients to 951 patients over the baseline model. Additionally, the number of failures decreased from 606 (29.90%) to 402 (19.83%). The contributions of the proposed method are threefold: We show that (1) the QA scores can be used as a loss function to perform semi-supervised learning for unlabeled data, (2) the well trained discriminator is learnt by QA score rather than traditional true/false, and (3) the performance of multi-organ segmentation on unlabeled datasets can be fine-tuned with more robust and higher accuracy than the original baseline method.
Dynamic contrast enhanced computed tomography (CT) is an imaging technique that provides critical information on the relationship of vascular structure and dynamics in the context of underlying anatomy. A key challenge for image processing with contrast enhanced CT is that phase discrepancies are latent in different tissues due to contrast protocols, vascular dynamics, and metabolism variance. Previous studies with deep learning frameworks have been proposed for classifying contrast enhancement with networks inspired by computer vision. Here, we revisit the challenge in the context of whole abdomen contrast enhanced CTs. To capture and compensate for the complex contrast changes, we propose a novel discriminator in the form of a multi-domain disentangled representation learning network. The goal of this network is to learn an intermediate representation that separates contrast enhancement from anatomy and enables classification of images with varying contrast time. Briefly, our unpaired contrast disentangling GAN(CD-GAN) Discriminator follows the ResNet architecture to classify a CT scan from different enhancement phases. To evaluate the approach, we trained the enhancement phase classifier on 21060 slices from two clinical cohorts of 230 subjects. The scans were manually labeled with three independent enhancement phases (non-contrast, portal venous and delayed). Testing was performed on 9100 slices from 30 independent subjects who had been imaged with CT scans from all contrast phases. Performance was quantified in terms of the multi-class normalized confusion matrix. The proposed network significantly improved correspondence over baseline UNet, ResNet50 and StarGAN's performance of accuracy scores 0.54. 0.55, 0.62 and 0.91, respectively (p-value<0.0001 paired t-test for ResNet versus CD-GAN). The proposed discriminator from the disentangled network presents a promising technique that may allow deeper modeling of dynamic imaging against patient specific anatomies.
Abstract Background Persistent mortality in adults hospitalized due to acute COVID-19 justifies pursuit of disease mechanisms and potential therapies. The aim was to evaluate which virus and host response factors were associated with mortality risk among participants in Therapeutics for Inpatients with COVID-19 (TICO/ACTIV-3) trials. Methods A secondary analysis of 2625 adults hospitalized for acute SARS-CoV-2 infection randomized to 1 of 5 antiviral products or matched placebo in 114 centers on 4 continents. Uniform, site-level collection of participant baseline clinical variables was performed. Research laboratories assayed baseline upper respiratory swabs for SARS-CoV-2 viral RNA and plasma for anti–SARS-CoV-2 antibodies, SARS-CoV-2 nucleocapsid antigen (viral Ag), and interleukin-6 (IL-6). Associations between factors and time to mortality by 90 days were assessed using univariate and multivariable Cox proportional hazards models. Results Viral Ag ≥4500 ng/L (vs <200 ng/L; adjusted hazard ratio [aHR], 2.07; 1.29–3.34), viral RNA (<35 000 copies/mL [aHR, 2.42; 1.09–5.34], ≥35 000 copies/mL [aHR, 2.84; 1.29–6.28], vs below detection), respiratory support (<4 L O2 [aHR, 1.84; 1.06–3.22]; ≥4 L O2 [aHR, 4.41; 2.63–7.39], or noninvasive ventilation/high-flow nasal cannula [aHR, 11.30; 6.46–19.75] vs no oxygen), renal impairment (aHR, 1.77; 1.29–2.42), and IL-6 >5.8 ng/L (aHR, 2.54 [1.74–3.70] vs ≤5.8 ng/L) were significantly associated with mortality risk in final adjusted analyses. Viral Ag, viral RNA, and IL-6 were not measured in real-time. Conclusions Baseline virus-specific, clinical, and biological variables are strongly associated with mortality risk within 90 days, revealing potential pathogen and host-response therapeutic targets for acute COVID-19 disease.
Human in-the-loop quality assurance (QA) is typically performed after medical image segmentation to ensure that the systems are performing as intended, as well as identifying and excluding outliers. By performing QA on large-scale, previously unlabeled testing data, categorical QA scores (e.g. "successful" versus "unsuccessful") can be generated. Unfortunately, the precious use of resources for human in-the-loop QA scores are not typically reused in medical image machine learning, especially to train a deep neural network for image segmentation. Herein, we perform a pilot study to investigate if the QA labels can be used as supplementary supervision to augment the training process in a semi-supervised fashion. In this paper, we propose a semi-supervised multi-organ segmentation deep neural network consisting of a traditional segmentation model generator and a QA involved discriminator. An existing 3-D abdominal segmentation network is employed, while the pre-trained ResNet-18 network is used as discriminator. A large-scale dataset of 2027 volumes are used to train the generator, whose 2-D montage images and segmentation mask with QA scores are used to train the discriminator. To generate the QA scores, the 2-D montage images were reviewed manually and coded 0 (success), 1 (errors consistent with published performance), and 2 (gross failure). Then, the ResNet-18 network was trained with 1623 montage images in equal distribution of all three code labels and achieved an accuracy 94% for classification predictions with 404 montage images withheld for the test cohort. To assess the performance of using the QA supervision, the discriminator was used as a loss function in a multi-organ segmentation pipeline. The inclusion of QA-loss function boosted performance on the unlabeled test dataset from 714 patients to 951 patients over the baseline model. Additionally, the number of failures decreased from 606 (29.90%) to 402 (19.83%). The contributions of the proposed method are threefold: We show that (1) the QA scores can be used as a loss function to perform semi-supervised learning for unlabeled data, (2) the well trained discriminator is learnt by QA score rather than traditional "true/false", and (3) the performance of multi-organ segmentation on unlabeled datasets can be fine-tuned with more robust and higher accuracy than the original baseline method. The use of QA-inspired loss functions represents a promising area of future research and may permit tighter integration of supervised and semi-supervised learning.
Neutralizing monoclonal antibodies (nmAbs) failed to show clear benefit for hospitalized patients with coronavirus disease 2019 (COVID-19). Dynamics of virologic and immunologic biomarkers remain poorly understood.
Human in-the-loop quality assurance (QA) is typically performed after medical image segmentation to ensure that the systems are performing as intended, as well as identifying and excluding outliers. By performing QA on large-scale, previously unlabeled testing data, categorical QA scores can be generatedIn this paper, we propose a semi-supervised multi-organ segmentation deep neural network consisting of a traditional segmentation model generator and a QA involved discriminator. A large-scale dataset of 2027 volumes are used to train the generator, whose 2-D montage images and segmentation mask with QA scores are used to train the discriminator. To generate the QA scores, the 2-D montage images were reviewed manually and coded 0 (success), 1 (errors consistent with published performance), and 2 (gross failure). Then, the ResNet-18 network was trained with 1623 montage images in equal distribution of all three code labels and achieved an accuracy 94% for classification predictions with 404 montage images withheld for the test cohort. To assess the performance of using the QA supervision, the discriminator was used as a loss function in a multi-organ segmentation pipeline. The inclusion of QA-loss function boosted performance on the unlabeled test dataset from 714 patients to 951 patients over the baseline model. Additionally, the number of failures decreased from 606 (29.90%) to 402 (19.83%). The contributions of the proposed method are threefold: We show that (1) the QA scores can be used as a loss function to perform semi-supervised learning for unlabeled data, (2) the well trained discriminator is learnt by QA score rather than traditional true/false, and (3) the performance of multi-organ segmentation on unlabeled datasets can be fine-tuned with more robust and higher accuracy than the original baseline method.
Segmentation of abdominal computed tomography(CT) provides spatial context, morphological properties, and a framework for tissue-specific radiomics to guide quantitative Radiological assessment. A 2015 MICCAI challenge spurred substantial innovation in multi-organ abdominal CT segmentation with both traditional and deep learning methods. Recent innovations in deep methods have driven performance toward levels for which clinical translation is appealing. However, continued cross-validation on open datasets presents the risk of indirect knowledge contamination and could result in circular reasoning. Moreover, 'real world' segmentations can be challenging due to the wide variability of abdomen physiology within patients. Herein, we perform two data retrievals to capture clinically acquired deidentified abdominal CT cohorts with respect to a recently published variation on 3D U-Net (baseline algorithm). First, we retrieved 2004 deidentified studies on 476 patients with diagnosis codes involving spleen abnormalities (cohort A). Second, we retrieved 4313 deidentified studies on 1754 patients without diagnosis codes involving spleen abnormalities (cohort B). We perform prospective evaluation of the existing algorithm on both cohorts, yielding 13% and 8% failure rate, respectively. Then, we identified 51 subjects in cohort A with segmentation failures and manually corrected the liver and gallbladder labels. We re-trained the model adding the manual labels, resulting in performance improvement of 9% and 6% failure rate for the A and B cohorts, respectively. In summary, the performance of the baseline on the prospective cohorts was similar to that on previously published datasets. Moreover, adding data from the first cohort substantively improved performance when evaluated on the second withheld validation cohort.
Dynamic contrast enhanced computed tomography (CT) is an imaging technique that provides critical information on the relationship of vascular structure and dynamics in the context of underlying anatomy. A key challenge for image processing with contrast enhanced CT is that phase discrepancies are latent in different tissues due to contrast protocols, vascular dynamics, and metabolism variance. Previous studies with deep learning frameworks have been proposed for classifying contrast enhancement with networks inspired by computer vision. Here, we revisit the challenge in the context of whole abdomen contrast enhanced CTs. To capture and compensate for the complex contrast changes, we propose a novel discriminator in the form of a multi-domain disentangled representation learning network. The goal of this network is to learn an intermediate representation that separates contrast enhancement from anatomy and enables classification of images with varying contrast time. Briefly, our unpaired contrast disentangling GAN(CD-GAN) Discriminator follows the ResNet architecture to classify a CT scan from different enhancement phases. To evaluate the approach, we trained the enhancement phase classifier on 21060 slices from two clinical cohorts of 230 subjects. Testing was performed on 9100 slices from 30 independent subjects who had been imaged with CT scans from all contrast phases. Performance was quantified in terms of the multi-class normalized confusion matrix. The proposed network significantly improved correspondence over baseline UNet, ResNet50 and StarGAN performance of accuracy scores 0.54. 0.55, 0.62 and 0.91, respectively. The proposed discriminator from the disentangled network presents a promising technique that may allow deeper modeling of dynamic imaging against patient specific anatomies.