A machine-learning approach for stratifying breast cancer risk in Chinese women on the basis of epidemiological factors and mammographic density

2019 
Abstract Background An accurate model to stratify breast cancer risk is required to move from a one-size-fits-all screening paradigm to more personalised risk-based screening strategies. Most risk models have limited accuracy and have been calibrated for white women. Considering the rapid increase in breast cancer incidence in China, in this study we aimed to develop a breast cancer model for Chinese women. Methods We recruited women from Fudan University Shanghai Cancer Centre. Information about their height, weight, body-mass index, age at menarche, age at first delivery, age at menopause, menopause status, parity history, number of children, breastfeeding history, personal history of breast cancer, family history of breast cancer, and degree of consanguinity was obtained. Mammograms were acquired from all recruited women. We used AutoDensity software to segment the mammographic dense area. To ensure that the breast density had not been affected by presence of the lesion, we measured the mammographic breast density of patients with cancer from the cancer-free breast contralateral to the biopsy-proven cancer. The breast area, dense area, percentage density, and demographic variables were input into an ensemble of 50 decision trees, the results of which we combined using a hybrid sampling and boosting algorithm (RUSBoost). The variables were also fed into the logistic and stepwise logistic regression models, which are two commonly used linear models. Leave-one-out cross-validation was used to assess the performance of the models and the receiver operating characteristics (ROC) curves were generated for each model. Findings In total, 1079 women (85 women with biopsy-proven breast cancer and 994 women without breast cancer) were recruited. The classifier had an area under ROC curve (AUC) of 0·86 (95% CI 0·82–0·89) for classifying the cases as normal or high-risk. It outperformed the accuracy of traditionally employed logistic and stepwise logistic regression models, which resulted in AUC values of 0·739 (0·67–0·81) and 0·757 (0·69–0·81), respectively. Interpretation When compared with linear models, use of a machine-learning approach, which models data non-linearity, could lead to improvement in the accuracy of a breast cancer risk stratification model. Such a model can be used to recommend intensified surveillance for women at a high risk of future breast cancer. Funding None.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []