Automatic Extraction and Decryption of Abbreviations from Domain-Specific Texts.

Michil P. Egorov,Anastasia A. Funkner

Automatic Extraction and Decryption of Abbreviations from Domain-Specific Texts.

2021

Michil P. Egorov
Anastasia A. Funkner

This paper explores the problems of extraction and decryption of abbreviations from domain-specific texts in Russian. The main focus are unstructured electronic medical records which pose specific preprocessing problems. The major challenge is that there is no uniform way to write medical histories. The aim of the paper is to generalize the way of decrypting abbreviations from any variant of text. A dataset of nearly three million medical records was collected. A classifier model was trained in order to extract and decrypt abbreviations. After testing the proposed method with 224,307 records, the model showed an F1 score of 93.7% on a valid dataset.

Keywords:

Domain (software engineering)
F1 score
Preprocessor
Natural language processing
Classifier (linguistics)
Encryption
Artificial intelligence
Focus (computing)
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations