Impact of hierarchies of clinical codes on predicting future days in hospital

2015 
Health insurance claims contain valuable information for predicting the future health of a population. Nowadays, with many mature machine learning algorithms, models can be implemented to predict future medical costs and hospitalizations. However, it is well-known that the way in which the data are represented significantly affects the performance of machine learning algorithms. In health insurance claims, key clinical information mainly comes from the associated clinical codes, such as diagnosis codes and procedure codes, which are hierarchically structured. In this study, it is investigated whether the hierarchies of such clinical codes can be utilized to improve predictive performance in the context of predicting future days in hospital. Empirical investigations were done on data sets of different sizes, considering that the frequency of the appearance of lower-level (more specific) clinical codes could vary significantly in populations of different sizes. The use of bagged trees with feature sets that include only basic demographic features, low-level, medium-level, high-level clinical codes, and a full feature set were compared. The main finding from this study is that different hierarchies of clinical codes do not have a significant impact on the predictive power. Some other findings include: 1) Sample size greatly affects the predictive outcome (more observations result in more stable and more accurate outcomes); 2) Combined use of enriched demographic features and clinical features give better performance as compared to using them separately.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    3
    Citations
    NaN
    KQI
    []