Tracking COVID-19 using online search

2020 
Online search data are routinely used to monitor the prevalence of infectious diseases, such as influenza. Previous work has focused on developing supervised models, where ground truth data, in the form of historical syndromic surveillance reports, can be used to train machine learning models. However, no sufficient data -- in terms of accuracy and time span -- exist to apply such approaches for monitoring the emerging COVID-19 infectious disease pandemic caused by a novel coronavirus (SARS-CoV-2). Therefore, unsupervised, or semi-supervised solutions should be sought. Recent outcomes have shown that it is possible to transfer an online search based model for influenza-like illness from a source to a target country without using ground truth data for the target location. The transferred model's accuracy depends on choosing search queries and their corresponding weights wisely, via a transfer learning methodology, for the target location. In this work, we draw a parallel to previous findings and attempt to develop an unsupervised model for COVID-19 by: (i) carefully choosing search queries that refer to related symptoms as identified by Public Health England in the United Kingdom (UK), and (ii) weighting them based on their reported ratio of occurrence in people infected by COVID-19. Finally, understanding that online searches may be also driven by concern rather than infections, we devise a preliminary approach that aims to minimise this part of the signal by incorporating a news media signal in association with confirmed COVID-19 cases. Results are presented for the UK, England, United States of America, Canada, Australia, and Greece.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    33
    Citations
    NaN
    KQI
    []