Robert J. Prill

IBM Research - Almaden

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Gustavo Stolovitzky

Icahn School of Medicine at Mount Sinai

Sofia Ahsanuddin

Icahn School of Medicine at Mount Sinai

Noah Alexander

Howard Hughes Medical Institute

Scott Tighe

University of Vermont

Ebrahim Afshinnekoo

Cornell University

Christopher E. Mason

MIND Research Institute

Shawn Levy

HudsonAlpha Institute for Biotechnology

Julio Sáez-Rodríguez

University Hospital Heidelberg

Rita R. Colwell

University of Maryland, College Park

Nur A. Hasan

University of Maryland, College Park

Cooperative Institutions

Cornell University

University of Colorado Boulder

University of California, San Diego

University of Chicago

Harvard University

New York University

Columbia University

National Institutes of Health

University of Minnesota

Argonne National Laboratory

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Lessons from the DREAM2 Challenges

Annals of the New York Academy of Sciences (2009)

Gustavo Stolovitzky Robert J. Prill Andrea Califano

Regardless of how creative, innovative, and elegant our computational methods, the ultimate proof of an algorithm's worth is the experimentally validated quality of its predictions. Unfortunately, this truism is hard to reduce to practice. Usually, modelers produce hundreds to hundreds of thousands of predictions, most (if not all) of which go untested. In a best-case scenario, a small subsample of predictions (three to ten usually) is experimentally validated, as a quality control step to attest to the global soundness of the full set of predictions. However, whether this small set is even representative of the global algorithm's performance is a question usually left unaddressed. Thus, a clear understanding of the strengths and weaknesses of an algorithm most often remains elusive, especially to the experimental biologists who must decide which tool to use to address a specific problem. In this chapter, we describe the first systematic set of challenges posed to the systems biology community in the framework of the DREAM (Dialogue for Reverse Engineering Assessments and Methods) project. These tests, which came to be known as the DREAM2 challenges, consist of data generously donated by participants to the DREAM project and curated in such a way as to become problems of network reconstruction and whose solutions, the actual networks behind the data, were withheld from the participants. The explanation of the resulting five challenges, a global comparison of the submissions, and a discussion of the best performing strategies are the main topics discussed.

Soundness

Best practice

Strengths and weaknesses

10.1111/j.1749-6632.2009.04497.x

Cite

Citations (203)

DREAMTools: a Python package for scoring collaborative challenges

F1000Research (2016)

Thomas Cokelaer Mukesh Bansal Christopher Bare Erhan Bilal Brian M. Bot

DREAM challenges are community competitions designed to advance computational methods and address fundamental questions in system biology and translational medicine. Each challenge asks participants to develop and apply computational methods to either predict unobserved outcomes or to identify unknown model parameters given a set of training data. Computational methods are evaluated using an automated scoring metric, scores are posted to a public leaderboard, and methods are published to facilitate community discussions on how to build improved methods. By engaging participants from a wide range of science and engineering backgrounds, DREAM challenges can comparatively evaluate a wide range of statistical, machine learning, and biophysical methods. Here, we describe DREAMTools, a Python package for evaluating DREAM challenge scoring metrics. DREAMTools provides a command line interface that enables researchers to test new methods on past challenges, as well as a framework for scoring new challenges. As of March 2016, DREAMTools includes more than 80% of completed DREAM challenges. DREAMTools complements the data, metadata, and software tools available at the DREAM website http://dreamchallenges.org and on the Synapse platform at https://www.synapse.org.Availability: DREAMTools is a Python package. Releases and documentation are available at http://pypi.python.org/pypi/dreamtools. The source code is available at http://github.com/dreamtools/dreamtools.

Python

10.12688/f1000research.7118.2

Cite

Citations (16)

Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics

Cell Systems (2015)

Ebrahim Afshinnekoo Cem Meydan Shanin Chowdhury Dyala Jaroudi Collin Boyer

10.1016/j.cels.2015.07.006

Cite

Citations (42)

Comprehensive Benchmarking and Ensemble Approaches for Metagenomic Classifiers

bioRxiv (Cold Spring Harbor Laboratory) (2017)

Alexa B. R. McIntyre Rachid Ounit Ebrahim Afshinnekoo Robert J. Prill Elizabeth Hénaff

Abstract Background One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest (n=35) to date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of eleven metagenomics classifiers. We also assess the effects of filtering and combining tools to reduce the number of false positives. Results Tools were characterized on the basis of their ability to (1) identify taxa at the genus, species, and strain levels, (2) quantify relative abundance measures of taxa, and (3) classify individual reads to the species level. Strikingly, the number of species identified by the eleven tools can differ by over three orders of magnitude on the same datasets. However, various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Indeed, leveraging tools with different heuristics is beneficial for improved precision. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species and where customized tools may be required. Conclusions The results of this study provide positive controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision and recall. We show that proper experimental design and analysis parameters, including depth of sequencing, choice of classifier or classifiers, database size, and filtering, can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

Benchmarking

Identification

Heuristics

10.1101/156919

Cite

Citations (29)

Monitoring the microbiome for food safety and quality using deep shotgun sequencing

npj Science of Food (2021)

Kristen L. Beck Niina Haiminen D. D. Chambliss Stefan Edlund Mark Kunitomi

In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test this hypothesis, we sequenced the total RNA of 31 high protein powder (HPP) samples of poultry meal pet food ingredients. We developed a microbiome analysis pipeline employing a key eukaryotic matrix filtering step that improved microbe detection specificity to >99.96% during in silico validation. The pipeline identified 119 microbial genera per HPP sample on average with 65 genera present in all samples. The most abundant of these were Bacteroides, Clostridium, Lactococcus, Aeromonas, and Citrobacter. We also observed shifts in the microbial community corresponding to ingredient composition differences. When comparing culture-based results for Salmonella with total RNA sequencing, we found that Salmonella growth did not correlate with multiple sequence analyses. We conclude that microbiome sequencing is useful to characterize complex food microbial communities, while additional work is required for predicting specific species' viability from total RNA sequencing.

Food microbiology

10.1038/s41538-020-00083-y

Cite

Citations (30)

BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters

Bioinformatics (2012)

Hailiang Huang Sandeep Tata Robert J. Prill

Abstract Summary: Computational workloads for genome-wide association studies (GWAS) are growing in scale and complexity outpacing the capabilities of single-threaded software designed for personal computers. The BlueSNP R package implements GWAS statistical tests in the R programming language and executes the calculations across computer clusters configured with Apache Hadoop, a de facto standard framework for distributed data processing using the MapReduce formalism. BlueSNP makes computationally intensive analyses, such as estimating empirical p-values via data permutation, and searching for expression quantitative trait loci over thousands of genes, feasible for large genotype–phenotype datasets. Availability and implementation: http://github.com/ibm-bioinformatics/bluesnp Contact: rjprill@us.ibm.com Supplementary information: Supplementary data are available at Bioinformatics online.

Genome-wide Association Study

R package

Computational statistics

IBM

10.1093/bioinformatics/bts647

Cite

Citations (42)

Eosinophil-derived Neurotoxin (EDN) - Sulphate Complex

D.D. Leonidas Ester Boix Robert J. Prill M. Suzuki R. Turton

Neurotoxin

10.2210/pdb1hi2/pdb

Cite

Citations (0)

DREAMTools: a Python package for scoring collaborative challenges

F1000Research (2015)

Thomas Cokelaer Mukesh Bansal Christopher Bare Erhan Bilal Brian M. Bot

DREAM challenges are community competitions designed to advance computational methods and address fundamental questions in system biology and translational medicine. Each challenge asks participants to develop and apply computational methods to either predict unobserved outcomes or to identify unknown model parameters given a set of training data. Computational methods are evaluated using an automated scoring metric, scores are posted to a public leaderboard, and methods are published to facilitate community discussions on how to build improved methods. By engaging participants from a wide range of science and engineering backgrounds, DREAM challenges can comparatively evaluate a wide range of statistical, machine learning, and biophysical methods. Here, we describe DREAMTools, a Python package for evaluating DREAM challenge scoring metrics. DREAMTools provides a command line interface that enables researchers to test new methods on past challenges, as well as a framework for scoring new challenges. As of September 2015, DREAMTools includes more than 80% of completed DREAM challenges. DREAMTools complements the data, metadata, and software tools available at the DREAM website http://dreamchallenges.org and on the Synapse platform https://www.synapse.org.Availability: DREAMTools is a Python package. Releases and documentation are available at http://pypi.python.org/pypi/dreamtools. The source code is available at http://github.com/dreamtools.

Python

10.12688/f1000research.7118.1

Cite

Citations (17)

DREAMTools: a Python package for scoring collaborative challenges [version 2; referees: 1 approved, 2 approved with reservations]

F1000Research (2016)

Thomas Cokelaer Mukesh Bansal Christopher Bare Erhan Bilal Brian M. Bot

Open peer review

Python

Source

Cite

Citations (2)

Dynamic Properties of Network Motifs Contribute to Biological Network Organization

PLoS Biology (2005)

Robert J. Prill Pablo A. Iglesias Andre Levchenko

Biological networks, such as those describing gene regulation, signal transduction, and neural synapses, are representations of large-scale dynamic systems. Discovery of organizing principles of biological networks can be enhanced by embracing the notion that there is a deep interplay between network structure and system dynamics. Recently, many structural characteristics of these non-random networks have been identified, but dynamical implications of the features have not been explored comprehensively. We demonstrate by exhaustive computational analysis that a dynamical property--stability or robustness to small perturbations--is highly correlated with the relative abundance of small subnetworks (network motifs) in several previously determined biological networks. We propose that robust dynamical stability is an influential property that can determine the non-random structure of biological networks.

Robustness

Network motif

Gene regulatory network

Network Analysis

Network Structure

Dynamic network analysis

Network Dynamics

10.1371/journal.pbio.0030343

Cite

Citations (390)