Graph based fusion of high-dimensional gene- and microRNA expression data
2013
One of the main goals in cancer studies including high-throughput microRNA
(miRNA) and mRNA data is to find and assess prognostic signatures capable
of predicting clinical outcome. Both mRNA and miRNA expression changes in
cancer diseases are described to reflect clinical characteristics like staging and
prognosis. Furthermore, miRNA abundance can directly affect target transcripts
and translation in tumor cells. Prediction models are trained to identify either
mRNA or miRNA signatures for patient stratification. With the increasing
number of microarray studies collecting mRNA and miRNA from the same
patient cohort there is a need for statistical methods to integrate or fuse both
kinds of data into one prediction model in order to find a combined signature
that improves the prediction.
Here, we propose a new method to fuse miRNA and mRNA data into one
prediction model. Since miRNAs are known regulators of mRNAs, correlations
between miRNA and mRNA expression data as well as target prediction
information were used to build a bipartite graph representing the relations
between miRNAs and mRNAs.
Feature selection is a critical part when fitting prediction models to high-
dimensional data. Most methods treat features, in this case genes or miRNAs,
as independent, an assumption that does not hold true when dealing with
combined gene and miRNA expression data. To improve prediction accuracy, a
description of the correlation structure in the data is needed. In this work the
bipartite graph was used to guide the feature selection and therewith improve
prediction results and find a stable prognostic signature of miRNAs and genes.
The method is evaluated on a prostate cancer data set comprising 98 patient
samples with miRNA and mRNA expression data. The biochemical relapse, an
important event in prostate cancer treatment, was used as clinical endpoint.
Biochemical relapse coins the renewed rise of the blood level of a prostate
marker (PSA) after surgical removal of the prostate. The relapse is a hint
for metastases and usually the point in clinical practise to decide for further
treatment.
A boosting approach was used to predict the biochemical relapse. It could
be shown that the bipartite graph in combination with miRNA and mRNA
expression data could improve prediction performance. Furthermore the ap-
proach improved the stability of the feature selection and therewith yielded
more consistent marker sets. Of course, the marker sets produced by this new
method contain mRNAs as well as miRNAs.
The new approach was compared to two state-of-the-art methods suited for
high-dimensional data and showed better prediction performance in both cases.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI