Forecasting Tuberculosis Incidence in China using Baidu Index: A Comparative Study

2020 
Background: Tuberculosis is a common infectious disease primarily targeting the lungs and of high morality and prevalence. Efficient prediction of tuberculosis is important to counter epidemics and successfully allocate recourse. This study's main objective is to investigate the effectiveness of using web search queries in predicting the incidence of tuberculosis in China. We conduct a comprehensive comparison on data driven methods for predicting the incidence of tuberculosis.Methods: Several data mining models are implemented in our study, including stepwise linear regression and SVM incorporating Baidu index (a recording of search queries on Baidu, the main search engine in China). The two methods are compared with traditional time series methods of autoregressive integrated moving (ARIMA) and seasonal ARIMA (SARIMA). In addition, to further investigate the reliability of prediction, the effectiveness of integrating the individual models is explored in our study, a hybrid model of SARIMA and SVM and Bayesian model averaging (BMA) are adopted to maximize the predictive utility of the models.Results and Conclusion: The experiment results show that Internet queries provide effective data sources for predicting tuberculosis, with comparable predicting ability to that of traditional time series models. It also shows that combining two or models using BMA or hybrid models can improve the prediction ability, with BMA showing by far the best results in prediction in terms of both MAPE and RSME in the 5 areas studied (Guangdong, Beijing, Tianjin and Shanghai). The findings from this study pave the way for developing accurate and timely prediction of tuberculosis cases, which is important for allocating healthcare recourses and developing strategies to counter possible future outbreaks in real practice.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []