Knowledge Graph Building from Real-world Multisource “Dirty” Clinical Electronic Medical Records for Intelligent Consultation Applications

2021 
Intelligent clinical consultation is a diagnostic support system that inferred the likely diseases from the patient's chief complaints as per the established relationship between symptoms and diseases. The key here is to learn and build automatically the general “symptom-disease” medical knowledge graph (MKG) from real-world clinical data. So, the quality of clinical data (chiefly electronic medical records - EMRs) directly affects the quality of the MKG, which in turn determines the quality of the consultation results. The regional public health information platform gathered a large number of front-pages of EMRs' from hospitals of all tiers across the region. The fact that the health IT systems used by hospitals are often sourced from different vendors, and each may have its own data standards and data quality control criteria, would invariably lead to apparent difference in the quality of EMRs collected. This is even so, considering the gaps in knowledge and skills between clinicians at different qualification levels. By detailed analysis of one such collection we found that the two most prominent problems are the inconsistency in diagnosis results and the mismatch between the diagnosis results and the chief complaints and the current illness history. In order to ensure the quality and effectiveness in building a knowledge graph from these real-world data, this paper proposed a “dirty” data cleaning framework including diagnostic results normalization and semantic similarity matching. The symptom-disease knowledge graph constructed from the cleaned data has been applied and verified in the intelligent consultation system.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []