Sebastian J. Schultheiß

Bernstein Center for Computational Neuroscience Tübingen

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Jörg Hagmann

Bernstein Center for Computational Neuroscience Tübingen

Claude Becker

Ludwig-Maximilians-Universität München

Adam Nunn

Institute of Bioinformatics

Detlef Weigel

Max Planck Institute for Biology

Ioanna Kakoulidou

Technical University of Munich

Rahul Pisupati

Gregor Mendel Institute of Molecular Plant Biology

Frank Johannes

Technical University of Munich

David Langenberger

Leipzig University

Patrick Hüther

Institute of Molecular Biology

Gunnar Rätsch

ETH Zurich

Cooperative Institutions

Max Planck Society

Max Planck Institute for Developmental Biology

Agricultural Research Service

United States Department of Agriculture

University of Toronto

Heidelberg University

Memorial Sloan Kettering Cancer Center

University of Tübingen

University of New Orleans

University of Maryland, College Park

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Ten Simple Rules for Providing a Scientific Web Resource

PLoS Computational Biology (2011)

Sebastian J. Schultheiß

Many projects in computational biology lead to the creation of a small application program or collection of scripts that can be of use to other scientists. A natural progression is to make this tool available via a Web site or by creating a service for it, from now on collectively called “Web resource.” We conducted a survey among providers and users of scientific Web resources, as well as a study on availability. The following rules reflect the experiences and opinions of over 250 scientists who have answered our questions and who use Web resources regularly, as well as our own experience. The study of availability allows us to draw objective conclusions about the characteristics of those Web resources that are still available and correlate the features that distinguish them from disappeared or nonfunctional ones. These ten simple rules aid you in designing and maintaining a scientific Web resource that is available to anyone interested in using it.

Web resource

Web engineering

10.1371/journal.pcbi.1001126

Cite

Citations (15)

Recommendation: MethylScore, a pipeline for accurate and context-aware identification of differentially methylated regions from population-scale plant whole-genome bisulfite sequencing data — R1/PR9

Patrick Hüther Jörg Hagmann Adam Nunn Ioanna Kakoulidou Rahul Pisupati

Whole-genome bisulfite sequencing (WGBS) is the standard method for profiling DNA methylation at single-nucleotide resolution. Different tools have been developed to extract differentially methylated regions (DMRs), often built upon assumptions from mammalian data. Here, we present MethylScore, a pipeline to analyse WGBS data and to account for the substantially more complex and variable nature of plant DNA methylation. MethylScore uses an unsupervised machine learning approach to segment the genome by classification into states of high and low methylation. It processes data from genomic alignments to DMR output and is designed to be usable by novice and expert users alike. We show how MethylScore can identify DMRs from hundreds of samples and how its data-driven approach can stratify associated samples without prior information. We identify DMRs in the A. thaliana 1,001 Genomes dataset to unveil known and unknown genotype–epigenotype associations .

Differentially methylated regions

Bisulfite sequencing

Bisulfite

10.1017/qpb.2022.14.pr9

Cite

Citations (0)

Interpretable Machine Learning Decodes Soil Microbiome’s Response to Drought Stress

bioRxiv (Cold Spring Harbor Laboratory) (2023)

Michelle Hagen Rupashree Dass Cathy Westhues Jochen Blom Sebastian J. Schultheiß

Abstract Background Extreme weather events induced by climate change, particularly droughts, have detrimental consequences for crop yields and food security. Concurrently, these conditions provoke substantial changes in the soil metagenome and affect plant health. Early recognition of soil affected by drought enables farmers to implement appropriate agricultural management practices. In this context, interpretable Machine Learning holds immense potential for drought stress classification in the soil metagenome based on marker taxa. Results This study demonstrates that the metagenomic approach of Differential Abundance Analysis methods and Machine Learning-based Shapley Additive Explanation values provide similar information. They exhibit their potential as complementary approaches for identifying marker taxa and investigating their enrichment or depletion under drought stress in grass lineages. Additionally, the Random Forest Classifier trained on a diverse range of relative abundance data from the soil metagenome of various plant species achieves a high accuracy of 92.3 % at the genus rank for drought stress prediction. It demonstrates its generalization capacity for the lineages tested. Conclusions In the detection of drought stress in the soil metagenome, this study emphasizes the potential of an optimized and generalized location-based ML classifier. By identifying marker taxa, this approach holds promising implications for microbe-assisted plant breeding programs and contributes to the development of sustainable agriculture practices. These findings are crucial for preserving global food security in the face of climate change.

10.1101/2023.11.30.569182

Cite

Citations (1)

Patrick Hüther Jörg Hagmann Adam Nunn Ioanna Kakoulidou Rahul Pisupati

Differentially methylated regions

Bisulfite sequencing

Bisulfite

10.1017/qpb.2022.14.pr4

Cite

Citations (0)

KIRMES: Kernel-based Identification of Regulatory Modules in Euchromatic Sequences.

German Conference on Bioinformatics (2008)

Sebastian J. Schultheiß Wolfgang Busch Jan U. Lohmann Oliver Kohlbacher Gunnar Rätsch

Kernel (algebra)

Identification

Euchromatin

Source

Cite

Citations (1)

Review: MethylScore, a pipeline for accurate and context-aware identification of differentially methylated regions from population-scale plant whole-genome bisulfite sequencing data — R1/PR8

Patrick Hüther Jörg Hagmann Adam Nunn Ioanna Kakoulidou Rahul Pisupati

Differentially methylated regions

Bisulfite sequencing

Bisulfite

10.1017/qpb.2022.14.pr8

Cite

Citations (0)

KIRMES: kernel-based identification of regulatory modules in euchromatic sequences

Bioinformatics (2009)

Sebastian J. Schultheiß Wolfgang Busch Jan U. Lohmann Oliver Kohlbacher Gunnar Rätsch

Understanding transcriptional regulation is one of the main challenges in computational biology. An important problem is the identification of transcription factor (TF) binding sites in promoter regions of potential TF target genes. It is typically approached by position weight matrix-based motif identification algorithms using Gibbs sampling, or heuristics to extend seed oligos. Such algorithms succeed in identifying single, relatively well-conserved binding sites, but tend to fail when it comes to the identification of combinations of several degenerate binding sites, as those often found in cis-regulatory modules.We propose a new algorithm that combines the benefits of existing motif finding with the ones of support vector machines (SVMs) to find degenerate motifs in order to improve the modeling of regulatory modules. In experiments on microarray data from Arabidopsis thaliana, we were able to show that the newly developed strategy significantly improves the recognition of TF targets.The python source code (open source-licensed under GPL), the data for the experiments and a Galaxy-based web service are available at http://www.fml.mpg.de/raetsch/suppl/kirmes/.

Heuristics

Ensembl

Identification

10.1093/bioinformatics/btp278

Cite

Citations (22)

MethylScore, a pipeline for accurate and context-aware identification of differentially methylated regions from population-scale plant WGBS data

bioRxiv (Cold Spring Harbor Laboratory) (2022)

Patrick Hüther Jörg Hagmann Adam Nunn Ioanna Kakoulidou Rahul Pisupati

Abstract Whole-genome bisulfite sequencing (WGBS) is the standard method for profiling DNA methylation at single-nucleotide resolution. Many WGBS-based studies aim to identify biologically relevant loci that display differential methylation between genotypes, treatment groups, tissues, or developmental stages. Over the years, different tools have been developed to extract differentially methylated regions (DMRs) from whole-genome data. Often, such tools are built upon assumptions from mammalian data and do not consider the substantially more complex and variable nature of plant DNA methylation. Here, we present MethylScore, a pipeline to analyze WGBS data and to account for plant-specific DNA methylation properties. MethylScore processes data from genomic alignments to DMR output and is designed to be usable by novice and expert users alike. It uses an unsupervised machine learning approach to segment the genome by classification into states of high and low methylation, substantially reducing the number of necessary statistical tests while increasing the signal-to-noise ratio and the statistical power. We show how MethylScore can identify DMRs from hundreds of samples and how its data-driven approach can stratify associated samples without prior information. We identify DMRs in the A. thaliana 1001 Genomes dataset to unveil known and unknown genotype-epigenotype associations. MethylScore is an accessible pipeline for plant WGBS data, with unprecedented features for DMR calling in small- and large-scale datasets; it is built as a Nextflow pipeline and its source code is available at https://github.com/Computomics/MethylScore .

Differentially methylated regions

Statistical power

Bisulfite sequencing

Identification

10.1101/2022.01.06.475031

Cite

Citations (7)

Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis

Bioinformatics (2014)

Vipin T. Sreedharan Sebastian J. Schultheiß Géraldine Jean André Kahles Regina Bohnert

Abstract We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan. Contact: vipin@cbio.mskcc.org, ratschg@mskcc.org Supplementary information: Supplementary data are available at Bioinformatics online.

Workbench

RNA-Seq

10.1093/bioinformatics/btt731

Cite

Citations (14)

Approaches taken, progress made, and enhanced utility of long read-based goat, swine, cattle and sheep reference genomes

Plant and Animal Genome XXIV Conference (2016)

Timothy P. L. Smith Sergey Koren Adam M. Phillippy Derek M. Bickhart Benjamin D. Rosen

Source

Cite

Citations (1)