Matching experiments across species using expression values and textual information

Aaron Wise,Zoltán N. Oltvai,Ziv Bar-Joseph

Matching experiments across species using expression values and textual information

2012

Motivation: With the vast increase in the number of gene expression datasets deposited in public databases, novel techniques are required to analyze and mine this wealth of data. Similar to the way BLAST enables cross-species comparison of sequence data, tools that enable cross-species expression comparison will allow us to better utilize these datasets: cross-species expression comparison enables us to address questions in evolution and development, and further allows the identification of disease-related genes and pathways that play similar roles in humans and model organisms. Unlike sequence, which is static, expression data changes over time and under different conditions. Thus, a prerequisite for performing cross-species analysis is the ability to match experiments across species. Results: To enable better cross-species comparisons, we developed methods for automatically identifying pairs of similar expression datasets across species. Our method uses a co-training algorithm to combine a model of expression similarity with a model of the text which accompanies the expression experiments. The co-training method outperforms previous methods based on expression similarity alone. Using expert analysis, we show that the new matches identified by our method indeed capture biological similarities across species. We then use the matched expression pairs between human and mouse to recover known and novel cycling genes as well as to identify genes with possible involvement in diabetes. By providing the ability to identify novel candidate genes in model organisms, our method opens the door to new models for studying diseases. Availability: Source code and supplementary information is available at: www.andrew.cmu.edu/user/aaronwis/cotrain12. Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations