Accurate classification of pediatric colonic IBD subtype using a random forest machine learning classifier.

2020 
BACKGROUND The paediatric inflammatory bowel disease (PIBD) classes algorithm was developed to bring consistency to labelling of colonic IBD, but labels are exclusively based on features atypical for ulcerative colitis (UC). AIM The aim of the study was to develop an algorithm and identify features that discriminate between paediatric UC and colonic Crohn disease (CD). METHODS Baseline clinical, endoscopic, radiologic, and histologic data, including the PIBD class features in 74 colonic IBD (56: UC, 18: colonic CD) patients were collected. The PIBD class features and additional features common to UC were used to perform initial clustering, using similarity network fusion (SNF). We trained a Random Forest (RF) classifier on the full dataset and used a leave-one-out approach to evaluate model accuracy. The top-features were used to build a new classifier, which we tested on 15 previously unused patients. We then performed clustering with SNF on the top RF features and assessed ability to discriminate between UC and colonic-CD independent of a supervised model. RESULTS The initial SNF clustering with 58 patients demonstrated 2 groups: group 1 (n = 39, 90% UC) and group 2 (n = 19, 68% colonic-CD). Our RF classifier correctly labelled 97% of the 58 patients based on leave-one-out cross validation and identified the 7 most important features (3 histological and 4 endoscopic) to clinically distinguish these groups. We trained a new RF classifier with the top 7 features and found 100% accuracy in a set of 15 held-out patients. Finally, post hoc clustering with these 7 features revealed 2 groups of patients: group 1 (n = 55, 98% UC) and group 2 (n = 18, 94% colonic-CD). CONCLUSIONS: A combination of supervised and unsupervised analyses identified a short list of features, which consistently distinguish UC from colonic CD. Future directions include validation in other populations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    4
    Citations
    NaN
    KQI
    []