Robert L. Grossman

University of Chicago

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Allison P. Heath

Children's Hospital of Philadelphia

Todd Pihl

Frederick National Laboratory for Cancer Research

Vincent Ferretti

Centre Hospitalier Universitaire Sainte-Justine

Jill S. Barnholtz‐Sloan

Division of Cancer Epidemiology and Genetics

Brandi N. Davis‐Dusenbery

Seven Bridges Genomics (United States)

Junjun Zhang

Guangxi University

Jean C. Zenklusen

National Cancer Institute

Brian D. O’Connor

Ontario Institute for Cancer Research

Kyle Ellrott

Oregon Health & Science University

Martin L. Ferguson

National Institutes of Health

Cooperative Institutions

University of Chicago

328

Harvard University

325

Broad Institute

254

National Institutes of Health

231

Ontario Institute for Cancer Research

176

University of Cambridge

174

The University of Texas MD Anderson Cancer Center

172

Wellcome Sanger Institute

169

University of Illinois Chicago

164

Dana-Farber Cancer Institute

157

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Large-Scale Uniform Analysis of Cancer Whole Genomes in Multiple Computing Environments

bioRxiv (Cold Spring Harbor Laboratory) (2017)

Christina K. Yung Brian D. O’Connor Sergei Yakneen Junjun Zhang Kyle Ellrott

Abstract The International Cancer Genome Consortium (ICGC)’s Pan-Cancer Analysis of Whole Genomes (PCAWG) project aimed to categorize somatic and germline variations in both coding and non-coding regions in over 2,800 cancer patients. To provide this dataset to the research working groups for downstream analysis, the PCAWG Technical Working Group marshalled ~800TB of sequencing data from distributed geographical locations; developed portable software for uniform alignment, variant calling, artifact filtering and variant merging; performed the analysis in a geographically and technologically disparate collection of compute environments; and disseminated high-quality validated consensus variants to the working groups. The PCAWG dataset has been mirrored to multiple repositories and can be located using the ICGC Data Portal. The PCAWG workflows are also available as Docker images through Dockstore enabling researchers to replicate our analysis on their own data.

Replicate

Toolbox

Artifact (error)

10.1101/161638

Cite

Citations (29)

The symbolic computation of vector field expressions

Springer eBooks (2005)

Robert L. Grossman Richard G. Larson

Carry (investment)

10.1007/bfb0006925

Cite

Citations (1)

Data Lakes, Clouds and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data

arXiv (Cornell University) (2018)

Robert L. Grossman

Data commons collate data with cloud computing infrastructure and commonly used software services, tools and applications to create biomedical resources for the large-scale management, analysis, harmonization, and sharing of biomedical data. Over the past few years, data commons have been used to analyze, harmonize and share large scale genomics datasets. Data ecosystems can be built by interoperating multiple data commons. It can be quite labor intensive to curate, import and analyze the data in a data commons. Data lakes provide an alternative to data commons and simply provide access to data, with the data curation and analysis deferred until later and delegated to those that access the data. We review software platforms for managing, analyzing and sharing genomic data, with an emphasis on data commons, but also covering data ecosystems and data lakes.

Data Sharing

Harmonization

Data discovery

10.48550/arxiv.1809.01699

Cite

Citations (0)

The ACCO u NT Consortium: A Model for the Discovery, Translation, and Implementation of Precision Medicine in African Americans

Clinical and Translational Science (2018)

Paula N. Friedman Mohammed Shaazuddin Li Gong Robert L. Grossman Arthur F. Harralson

The majority of pharmacogenomic (PGx) studies have been conducted on European ancestry populations, thereby excluding minority populations and impeding the discovery and translation of African American–specific genetic variation into precision medicine. Without accounting for variants found in African Americans, clinical recommendations based solely on genetic biomarkers found in European populations could result in misclassification of drug response in African American patients. To address these challenges, we formed the Transdisciplinary Collaborative Center ( TCC ), African American Cardiovascular Pharmacogenetic Consortium ( ACCO u NT ), to discover novel genetic variants in African Americans related to clinically actionable cardiovascular phenotypes and to incorporate African American–specific sequence variations into clinical recommendations at the point of care. The TCC consists of two research projects focused on discovery and translation of genetic findings and four cores that support the projects. In addition, the largest repository of PGx information on African Americans is being established as well as lasting infrastructure that can be utilized to spur continued research in this understudied population.

Pharmacogenomics

10.1111/cts.12608

Cite

Citations (36)

Use of the the Earth Observing One (EO-1) satellite for the Namibia SensorWeb flood early warning pilot

Daniel Mandl Stuart Frye Pat Cappelaere Matthew Handy Fritz Policelli

The Earth Observing One (EO-1) satellite was launched in November 2000 as a one year technology demonstration mission for a variety of space technologies. After the first year, it was used as a pathfinder for the creation of SensorWebs. A SensorWeb is the integration of a variety of space, airborne and ground sensors into a loosely coupled collaborative sensor system that automatically provides useful data products. Typically, a SensorWeb is comprised of heterogeneous sensors tied together with an open messaging architecture and web services. SensorWebs provide easier access to sensor data, automated data product production and rapid data product delivery. Disasters are the perfect arena to test SensorWeb functionality since emergency workers and managers need easy and rapid access to satellite, airborne and in-situ sensor data as decision support tools. The Namibia Early Flood Warning SensorWeb pilot project was established to experiment with various aspects of sensor interoperability and SensorWeb functionality. The SensorWeb system features EO-1 data along with other data sets from such satellites as Radarsat, Terra and Aqua. Finally, the SensorWeb team began to examine how to measure economic impact of SensorWeb technology infusion. This paper describes the architecture and software components that were developed along with performance improvements that were experienced. Also, problems and challenges that were encountered are described along with a vision for future enhancements to mitigate some of the problems.

Earth observation

Sensor web

Source

Cite

Citations (0)

PTool

ACM SIGMOD Record (1995)

Robert L. Grossman D. Hanley X. Qin

No abstract available.

10.1145/568271.223901

Cite

Citations (0)

Querying databases of trajectories of differential equations 2: Index functions

Robert L. Grossman

Suppose that a large number of parameterized trajectories (gamma) of a dynamical system evolving in R sup N are stored in a database. Let eta is contained R sup N denote a parameterized path in Euclidean space, and let parallel to center dot parallel to denote a norm on the space of paths. A data structures and indices for trajectories are defined and algorithms are given to answer queries of the following forms: Query 1. Given a path eta, determine whether eta occurs as a subtrajectory of any trajectory gamma from the database. If so, return the trajectory; otherwise, return null. Query 2. Given a path eta, return the trajectory gamma from the database which minimizes the norm parallel to eta - gamma parallel.

Source

Cite

Citations (0)

Author Correction: Divergent mutational processes distinguish hypoxic and normoxic tumours

Nature Communications (2022)

Vinayak Bhandari Constance H. Li Robert G. Bristow Paul C. Boutros Lauri A. Aaltonen

Hypoxia

10.1038/s41467-022-32339-4

Cite

Citations (0)

CNT: Semi-Automatic Translation from CWL to Nextflow for Genomic Workflows

Martin L. Putra In Kee Kim Haryadi S. Gunawi Robert L. Grossman

With the rise of advanced workflow languages for scientific computations, Nextflow has gained increased attention from the bioinformatics community. Nextflow offers native support for advanced parallelism, which can greatly enhance resource utilization and throughput. Still, a significant portion of bioinformatics workflows are developed with the Common Workflow Language (CWL). Transitioning from CWL to Nextflow poses a significant challenge due to the differences in programming models, scripting language compatibilities, and the prerequisite for in-depth knowledge in both languages. To address this challenge, we present CNT, a novel, semi-automated translator converting CWL workflows into Nextflow ones. At its core, CNT uses an automated translation mechanism that converts the CommandLineTool, the most basic unit of CWL, into Nextflow's Process class. This component integrates tool-level conversion, graph dependency analysis, and correctness checks to provide highly automated translation coverage, significantly reducing the development time while satisfying language-specific requirements like building a proper dataflow model when creating workflows. Furthermore, CNT incorporates a module for aiding manual translation. Specifically, it can identify three common JavaScript patterns in CWL workflows, offering further guidance for developers during the translation phase. We evaluated CNT with production-grade workflows and found that it can cover up to 81% of the original workflows, substantially reducing development time. Additionally, transitioning from a cwltool-based system to Nextflow with CNT can result in a 72% speedup and 85% increased CPU utilization.

10.1109/bibe60311.2023.00012

Cite

Citations (0)

Global and Local Approach of Part-of-Speech Tagging for Large Corpora

National Conference on Artificial Intelligence (2012)

Shi Yu Robert L. Grossman Andrey Rzhetsky

We present Global-Local POS tagging, a framework to train generative stochastic Part-of-Speech models on large corpora. Global Taggers offer several advantages over their counter parts trained on small, curated corpus, including the ability to automatically extend and update their models to new text. Global Taggers also avoid a fundamental limitation of current models, whose performance heavily relies on curated text with manually assigned labels. We illustrate our approach by training several Global Taggers, implemented with generative stochastic models, on two large corpora using high performance computing architecture. We further demonstrate that global taggers can be improved by incorporating models trained on curated text, called Local Taggers, for better tagging performance derived from specific topics.

Part-of-Speech Tagging

Generative model

Source

Cite

Citations (0)