Evaluation of network inference algorithms by means of synthetic data and characterization of graphs using probabilistic decomposition

A. Verschoren,Koenraad Van Leemput

Evaluation of network inference algorithms by means of synthetic data and characterization of graphs using probabilistic decomposition

2008

Recent technological advances have had a significant impact on the way biological research is being done, making it an increasingly large-scale and data-rich field. Simultaneously, a shift toward studying networks of interactions in a biological system has presented researchers with new challenges regarding data integration and analysis and has led to the emergence of new fields of research, such as bioinformatics and computational biology. Within these disciplines, finding ways to validate new algorithmic techniques and to investigate their performance and limitations remains an important challenge. For instance, the limited availability and incompleteness of experimental data severely constrains our ability to validate network inference algorithms—algorithms that attempt to reverse engineer the structure of biological networks using a variety of data sources. In this dissertation, a tool (SynTReN) was developed to address this challenge by generating biologically realistic synthetic microarray data. Through adjusting the attributes of the generated data, SynTReN can be used to probe the operational characteristics of inference algorithms in a fast and reproducible manner. The application scope of a number of published inference algorithms was explored, establishing both the added value of biologically plausible synthetic expression data in general, and the flexibility of SynTReN in particular. Among other variables, network topology was shown to significantly impact algorithm performance. This finding, coupled with the fact that biological networks share specific structural properties, suggests that prior knowledge about network structure could be a useful aid to inference. However, thus far it has proved difficult to capture this information in a sufficiently detailed model. Therefore, building on existing work demonstrating the presence in biological networks of recurring small graph structures, known as motifs, a new model for characterizing graphs was introduced. By probabilistically combining motifs, key topological characteristics can be captured, allowing a family of graphs to be both described and generated through decomposition into and composition from motif sequences. An evolutionary computing approach was used to learn appropriate motif sets for new graph classes. Computational complexity was found to be the main disadvantage of the method, limiting its usefulness for complex, realistic graph types.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations