Development of an unsupervised machine learning algorithm for the prognostication of walking ability in spinal cord injury patients.

Zachary DeVries,Mohamad Hoda,Carly S. Rivers,Audrey Maher,Eugene Wai,Dita Moravek,Alexandra Stratton,Stephen P. Kingwell,Nader Fallah,Jerome Paquet,Philippe Phan

Development of an unsupervised machine learning algorithm for the prognostication of walking ability in spinal cord injury patients.

2019

Abstract Background Context Traumatic spinal cord injury can have a dramatic effect on a patient's life. The degree of neurological recovery greatly influences a patient's treatment and expected quality of life. This has resulted in the development of machine learning algorithms (MLA) that use acute demographic and neurological information to prognosticate recovery. The van Middendorp et al. (2011) (vM) logistic regression (LR) model has been established as a reference model for the prediction of walking recovery following spinal cord injury as it has been validated within many different countries. However, an examination of the way in which these prediction models are evaluated is warranted. The area under the receiver operators curve (AUROC) has been consistently used when evaluating model performance, but it has been shown that AUROC overemphasizes the most common event resulting in an inaccurate assessment when the data is imbalanced. Furthermore, there is evidence that the use of more advanced MLA, such as an unsupervised k-means model, may show superior performance compared to LR as they can handle a larger number of features. Purpose The first objective of the study was to assess the performance of both an unsupervised MLA and LR model with complete admission neurological information against the vM and Hicks models. Secondly, a comparison between the accuracy of the AUROC and the F1-score will be made to determine which method is superior for the assessment of diagnostic performance of prediction models on large-scale datasets. Study Design Retrospective review of a prospective cohort study. Patient Sample The Rick Hansen Spinal Cord Injury Registry (RHSCIR) was used in this study. All patients enrolled between 2004 and 2017 with complete neurological examination and Functional Independence Measure (FIM) outcome data at ≥1 year follow-up or who could walk at discharge were included. The prognostic variables included: age (dichotomized at ≥65 years old); American Spinal Injury Association Impairement Scale (AIS) grade; and individual motor, light touch, and pinprick score from L2-S1. Outcome Measures The FIM locomotor score was used to assess independent walking ability at discharge or 1-year follow-up. Methods An unsupervised MLA with k=2 was chosen in order to identify a “walk” cluster and a “not walk” cluster. Model performance was assessed through the development of a receiver operating characteristic curve with associated AUROC and a precision-recall curve with associated F1-score. The study and the RHSCIR are supported by funding from Health Canada, Western Economic Diversification Canada, and the Governments of Alberta, British Columbia, Manitoba, and Ontario. These funders had no role in the study or study reporting and the authors have no conflicts of interest to report. Results No clinically relevant differences were found between with the use of an unsupervised MLA with a greater amount of initial neurological information compared to the established standards for any AIS classification. Although demonstrated for all separate AIS classifications, most notably, the AUROC for the vM (0.78) and Hicks models (0.76) were found to be superior to that of the new LR model (0.72); however, the vM and Hicks models had more than double the amount of false negative classifications compared to the LR. The F1-scores between these three models were also found to be different but with the vM and Hicks models being lower than the LR (0.85, 0.81, and 0.89, respectively). Conclusions No clinically relevant differences were found between the use of an unsupervised MLA with complete admission neurological information compared to the previously validated standards; however, when comparing the performance of the AUROC and F1-score, the AUROC showed inaccurate prognostic performance when there was an imbalance towards a greater amount of false negatives. Importantly, the F1-score did not succumb to this imbalance. As AUROC has been used as the standard when evaluating performance of prediction models, consideration as to whether this is the most appropriate method is warranted. Future work should focus on comparing AUROC and F1-scores with other previously validated models.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations