Self-channel-and-spatial-attention neural network for automated multi-organ segmentation on head and neck CT images

2020 
Accurate segmentation of organs-at-risk (OARs) is necessary for adaptive head and neck (H&N) cancer treatment planning but manual delineation is tedious, slow, and inconsistent. A Self-Channel-and-Spatial-Attention neural network (SCSA-Net) is developed for H&N OARs segmentation on CT images. To simultaneously ease the training and improve the segmentation performance, the proposed SCSA-Net utilizes the self-attention ability of the network. Spatial and channel-wise attention learning mechanisms are both employed to adaptively force the network to emphasize on the meaningful features and weaken the irrelevant features simultaneously. The proposed network was first evaluated on a public dataset, which includes 48 patients, then on a separate serial CT dataset, which contains ten patients who received weekly diagnostic fan-beam CT scans. On the second dataset, the accuracy of using SCSA-Net to track the parotid and submandibular gland volume changes during radiotherapy treatment was quantified. Dice similarity coefficient (DSC), positive predictive value (PPV), sensitivity (SEN), average surface distance (ASD), and 95%maximum surface distance (95SD) were calculated on the brainstem, optic chiasm, optic nerves, mandible, parotid glands, and submandibular glands to evaluate the proposed SCSA-Net. The proposed SCSA-Net consistently outperforms the state-of-the-art methods on the public dataset. Specifically, compared with the Res-Net and SE-Net, which is constructed by the Squeeze-and-Excitation block equipped Residual blocks, the DSC of the optic nerves and submandibular glands is improved by 0.06, 0.03 and 0.05, 0.04 by the SCSA-Net. Moreover, the proposed method achieves statistically significant improvements in terms of DSC on all and 8 of 9 OARs over Res-Net and SE-Net, respectively. The trained network was able to achieve good segmentation results on the serial dataset, but the results were further improved after fine-tuning of the model using the simulation CT images. For the parotids and submandibular glands, the volume changes of individual patients are highly consistent between the automated and manual segmentation (Pearson's Correlation 0.97-0.99). The proposed SCSA-Net is computationally efficient to perform segmentation (~2 seconds/CT).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    10
    Citations
    NaN
    KQI
    []