Deep Bidirectional Recurrent Neural Networks as End-To-End Models for Smoking Status Extraction from Clinical Notes in Spanish

2018 
Although natural language processing (NLP) tools have been available in English for quite some time, it is not the case for many other languages, particularly for context specific texts like clinical notes. This poses a challenge for tasks like text classification in languages other than English. In the absence of basic NLP tools, manually engineering features that capture semantic information of the documents is a potential solution. Nevertheless, it is very time consuming. Deep neural networks, particularly deep recurrent neural networks (RNN), have been proposed as End-to-End models that learn both features and parameters jointly, thus avoiding the need to manually encode the features. We compared the performance of two classifiers for labeling 14718 clinical notes in Spanish according to the patients9 smoking status: a bag-of-words model involving heavy manual feature engineering and a bidirectional long-short-term-memory (LSTM) deep recurrent neural network (RNN) with GloVe word embeddings. The RNN slightly outperforms the bag-of-words model, but with 80% less overall development time. Such algorithms can facilitate the exploitation of clinical notes in languages in which NLP tools are not as developed as in English.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    51
    References
    0
    Citations
    NaN
    KQI
    []