Predictive Modeling for Differential Diagnosis and Mortality Risk Assessment

Tony Lindsey,Saul J Vega,Sena Veazey,Jose Salinas

Predictive Modeling for Differential Diagnosis and Mortality Risk Assessment

2019

The prevalence of electronic health record (EHR) systems has brought prodigious biomedical informatics opportunity. Automated machine learning methods can effectively utilize such data and have become common tools for healthcare predictive modeling. Researches in medical informatics have explored the potential of deep learning and classical models in emergent care scenarios. In particular, predicting differential diagnoses for admissions have proven useful in decreasing unnecessary lab tests and improving inpatient triage decision-making. Moreover, identification of high-risk patients for in-hospital mortality is vitally important to maximize allocation of medical resources.The Medical Information Mart for Intensive Care (MIMIC-III) database, containing de-identified critical care inpatient was used in our study. This data set captures hospital patient laboratory measurements, pharmacologic prescriptions, diagnostic data and procedure event recordings. When considering adult patients and discounting admissions with ICU length of stay less than 24 hours, there were 37,787 unique admissions and 30,414 total patients. We examined the top 25 most prevalent ICD-9 group-level disease specificities in MIMIC-III using a multi-label classification model. In-hospital mortality was modeled as binary classification with 4,155 (13%) adult patients that expired, of which 3,138 (75.5%) were in the ICU setting. The metrics AUC, F1 score, sensitivity and specificity values calculated for each disease label measured prediction performance.The usage of ICD-9 group codes reduced feature dimension from 14,567 to 942 and greatly improved distribution of patient diagnostic categories. Disease temporal patterns were captured by considering the most frequently sampled 6 vital signs and 13 laboratory values. Missing data were imputed at each time-stamp. Time-series raw hourly average values were converted into 5 summary features (mean, standard deviation, number of observations, min & max values). Patient demographic variables such as age, gender, marital status and ethnicity were also factored into the modeling. Choi et al showed that contextual embedding of medical data, diagnostic and procedural codes alone can predict future diagnoses with sensitivity as high as 0.79. We utilized an embedding technique called word2vec which allowed sparse representations of medical history to be transformed into dense word vectors. The mappings captured contextual information by treating each admission as a sentence and learning the most likely neighboring words in a sliding window fashion. Binary and multi-label classification was achieved via collapse models, which do not consider temporal information, as well as recurrent neural networks with regularization, Softmax output layer activation together with categorical cross-entropy as the loss function.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations