Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis.

2021 
OBJECTIVE To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML). MATERIALS AND METHODS This retrospective study utilized two databases: a) the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) nonalcoholic fatty liver disease (NAFLD) adult database (2004-2009), and b) the Optum® de-identified Electronic Health Record dataset (2007-2018), a real-world dataset representative of common electronic health records in the United States. We developed an ML model to predict NASH, using confirmed NASH and non-NASH based on liver histology results in the NIDDK dataset to train the model. RESULTS Models were trained and tested on NIDDK NAFLD data (704 patients) and the best-performing models evaluated on Optum data (~3,000,000 patients). An eXtreme Gradient Boosting model (XGBoost) consisting of 14 features exhibited high performance as measured by area under the curve (0.82), sensitivity (81%), and precision (81%) in predicting NASH. Slightly reduced performance was observed with an abbreviated feature set of 5 variables (0.79, 80%, 80%, respectively). The full model demonstrated good performance (AUC 0.76) to predict NASH in Optum data. DISCUSSION The proposed model, named NASHmap, is the first ML model developed with confirmed NASH and non-NASH cases as determined through liver biopsy and validated on a large, real-world patient dataset. Both the 14 and 5-feature versions exhibit high performance. CONCLUSION The NASHmap model is a convenient and high performing tool that could be used to identify patients likely to have NASH in clinical settings, allowing better patient management and optimal allocation of clinical resources.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    3
    Citations
    NaN
    KQI
    []