Comparative Study of Various Approaches for Ensemble-based De-identification of Electronic Health Record Narratives.

Youngjun Kim,Paul M. Heider,Stéphane M. Meystre

Comparative Study of Various Approaches for Ensemble-based De-identification of Electronic Health Record Narratives.

2020

De-identification of electric health record narratives is a fundamental task applying natural language processing to better protect patient information privacy. We explore different types of ensemble learning methods to improve clinical text de-identification. We present two ensemble-based approaches for combining multiple predictive models. The first method selects an optimal subset of de-identification models by greedy exclusion. This ensemble pruning allows one to save computational time or physical resources while achieving similar or better performance than the ensemble of all members. The second method uses a sequence of words to train a sequential model. For this sequence labelling-based stacked ensemble, we employ search-based structured prediction and bidirectional long short-term memory algorithms. We create ensembles consisting of de-identification models trained on two clinical text corpora. Experimental results show that our ensemble systems can effectively integrate predictions from individual models and offer better generalization across two different corpora.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations