Investigation into probabilistic inference with uncertain data acquired from different data source

Huaying Zhu

Investigation into probabilistic inference with uncertain data acquired from different data source

2018

Huaying Zhu

This thesis investigates probabilistic inference with uncertain data acquired from different data sources. The research originates from a case study of asthma which causes a large number of deaths and has widespread influence. As there are no medical cures for the disease, control is very important and NHS measures the level of asthma control by five steps, but NHS has no clear instruction on how to identify asthma control steps. It is necessary to develop diagnosis rules on asthma control steps to supplement the current guideline and to help relevant patients in better monitoring and managing the ailment. To develop diagnosis rules, some classification techniques are considered, but the data are collected from different data sources which lead to uncertainty in data prior distribution. The uncertainty hinders modelling in traditional techniques like Bayesian Network, logistic regression, and ANN, because they require a certain prior distribution. In the asthma case, data are collected from different clinicians with different preferences in recording patient�s information. Therefore, a few patients share the same recorded symptoms and the data size is too small to build models with all or some variables. Estimation for missing values is inappropriate in this case because it is unknown why a record for a symptom is lost. As a prior-free likelihood inference process, the Evidential Reasoning (ER) rule has a potential to deal with the challenge of uncertain prior. This thesis explores the ER rule to solve the uncertainty issue in prior distribution and develops a new model for identifying asthma control steps. The ER rule is applied to combine multiple pieces of evidence, with each piece of evidence acquired from an observed variable and represented as a probability distribution on hypothesis space. The proposed ER prognostic model has desirable flexibility when dealing with multiple pieces of evidence from different data sources. Compared with other classification methods, the ER-based model is the only one which takes the quality of evidence into consideration. Correspondingly, the ER-based model is unique to deal with data collected from multiple sources. The model to identify asthma control steps can be generalised as a probabilistic inference method which can be further applied in other classification or reasoning problems with uncertain prior. This thesis compares four different methods in training via controlled experiments with four data sets. The comparison further investigates the ER-based model and contributes to the existing knowledge in parameter training in the ER rule by identifying which method is more appropriate under different scenarios. Besides parameter training, another problem arises: how to deal with continuous variables. The existing methods to discretise data will lead to information loss and bias in results. This thesis proposes a method to directly apply continuous variables in the ER-based model. The proposed model contributes to avoid information loss caused by discretisation of continuous variables in the ER-based model. Apart from its utility in the ER-based model with continuous data, this model can be further extended to other probabilistic inference methods.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations