Word Embedding for French Natural Language in Healthcare: A Comparative Study.

Emeric Dynomant,Romain Lelong,Badisse Dahamna,Clément Massonnaud,Gaétan Kerdelhué,Julien Grosjean,Stéphane Canu,Stéfan Jacques Darmoni

Word Embedding for French Natural Language in Healthcare: A Comparative Study.

2019

Emeric Dynomant
Romain Lelong
Badisse Dahamna
Clément Massonnaud
Gaétan Kerdelhué
Julien Grosjean
Stéphane Canu
Stéfan Jacques Darmoni

Structuring raw medical documents with ontology mapping is now the next step for medical intelligence. Deep learning models take as input mathematically embedded information, such as encoded texts. To do so, word embedding methods can represent every word from a text as a fixed-length vector. A formal evaluation of three word embedding methods has been performed on raw medical documents. The data corresponds to more than 12M diverse documents produced in the Rouen hospital (drug prescriptions, discharge and surgery summaries, inter-services letters, etc.). Automatic and manual validation demonstrates that Word2Vec based on the skip-gram architecture had the best rate on three out of four accuracy tests. This model will now be used as the first layer of an AI-based semantic annotator.

Keywords:

Word embedding
Linguistics
Computer science
Health care
Natural language
Artificial intelligence
Deep learning
Semantic integration
Word2vec
Medicine
Structuring
Data mining
Natural language processing
Architecture
word processing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations