Sequencing technologies, in particular RNASeq, have become critical tools in the design, build, test and learn cycle of synthetic biology. They provide a better understanding of synthetic designs, and they help identify ways to improve and select designs. While these data are beneficial to design, their collection and analysis is a complex, multistep process that has implications on both discovery and reproducibility of experiments. Additionally, tool parameters, experimental metadata, normalization of data and standardization of file formats present challenges that are computationally intensive. This calls for high-throughput pipelines expressly designed to handle the combinatorial and longitudinal nature of synthetic biology. In this paper, we present a pipeline to maximize the analytical reproducibility of RNASeq for synthetic biologists. We also explore the impact of reproducibility on the validation of machine learning models. We present the design of a pipeline that combines traditional RNASeq data processing tools with structured metadata tracking to allow for the exploration of the combinatorial design in a high-throughput and reproducible manner. We then demonstrate utility via two different experiments: a control comparison experiment and a machine learning model experiment. The first experiment compares datasets collected from identical biological controls across multiple days for two different organisms. It shows that a reproducible experimental protocol for one organism does not guarantee reproducibility in another. The second experiment quantifies the differences in experimental runs from multiple perspectives. It shows that the lack of reproducibility from these different perspectives can place an upper bound on the validation of machine learning models trained on RNASeq data. Graphical Abstract.
ABSTRACT Blood transcriptional signatures are promising for tuberculosis (TB) diagnosis but have not been evaluated among U.S. patients. To be used clinically, transcriptional classifiers need reproducible accuracy in diverse populations that vary in genetic composition, disease spectrum and severity, and comorbidities. In a prospective case-control study, we identified novel transcriptional classifiers for active TB among U.S. patients and systematically compared their accuracy to classifiers from published studies. Blood samples from HIV-uninfected U.S. adults with active TB, pneumonia, or latent TB infection underwent whole-transcriptome microarray. We used support vector machines to classify disease state based on transcriptional patterns. We externally validated our classifiers using data from sub-Saharan African cohorts and evaluated previously published transcriptional classifiers in our population. Our classifier distinguishing active TB from pneumonia had an area under the concentration-time curve (AUC) of 96.5% (95.4% to 97.6%) among U.S. patients, but the AUC was lower (90.6% [89.6% to 91.7%]) in HIV-uninfected Sub-Saharan Africans. Previously published comparable classifiers had AUC values of 90.0% (87.7% to 92.3%) and 82.9% (80.8% to 85.1%) when tested in U.S. patients. Our classifier distinguishing active TB from latent TB had AUC values of 95.9% (95.2% to 96.6%) among U.S. patients and 95.3% (94.7% to 96.0%) among Sub-Saharan Africans. Previously published comparable classifiers had AUC values of 98.0% (97.4% to 98.7%) and 94.8% (92.9% to 96.8%) when tested in U.S. patients. Blood transcriptional classifiers accurately detected active TB among U.S. adults. The accuracy of classifiers for active TB versus that of other diseases decreased when tested in new populations with different disease controls, suggesting additional studies are required to enhance generalizability. Classifiers that distinguish active TB from latent TB are accurate and generalizable across populations and can be explored as screening assays.
This study quantified eight, small molecule neurotransmitters collected simultaneously from prefrontal cortex of C57BL/6J mouse (n=23) during wakefulness and during isoflurane anesthesia (1.3%). Using isoflurane anesthesia as an independent variable enabled evaluation of the hypothesis that isoflurane anesthesia differentially alters concentrations of multiple neurotransmitters and their interactions. Machine learning was applied to reveal higher order interactions among neurotransmitters. Using a between-subjects design, microdialysis was performed during wakefulness and during anesthesia. Concentrations (nM) of acetylcholine, adenosine, dopamine, GABA, glutamate, histamine, norepinephrine, and serotonin in the dialysis samples are reported (mean ± SD). Relative to wakefulness, acetylcholine concentration was lower during isoflurane anesthesia (1.254 ± 1.118 versus 0.401 ± 0.134, P=0.009), and concentrations of adenosine (29.456 ± 29.756 versus 101.321 ± 38.603, P<0.001), dopamine (0.0578 ± 0.0384 versus 0.113 ± 0.084, P=0.036), and norepinephrine (0.126 ± 0.080 versus 0.219 ± 0.066, P=0.010) were higher during anesthesia. Isoflurane reconfigured neurotransmitter interactions in prefrontal cortex, and the state of isoflurane anesthesia was reliably predicted by prefrontal cortex concentrations of adenosine, norepinephrine, and acetylcholine. A novel finding to emerge from machine learning analyses is that neurotransmitter concentration profiles in mouse prefrontal cortex undergo functional reconfiguration during isoflurane anesthesia. Adenosine, norepinephrine, and acetylcholine showed high feature importance, supporting the interpretation that interactions among these three transmitters may play a key role in modulating levels of cortical and behavioral arousal.
Mycobacterium africanum lineage (L) 6 is an important pathogen in West Africa, causing up to 40% of pulmonary tuberculosis (TB). The biology underlying the clinical differences between M. africanum and M. tuberculosis sensu stricto remains poorly understood. We performed ex vivo expression of 2179 genes of the most geographically dispersed cause of human TB, M. tuberculosis L4 and the geographically restricted, M. africanum L6 directly from sputa of 11 HIV-negative TB patients from The Gambia who had not started treatment. The DosR regulon was the most significantly decreased category in L6 relative to L4. Further, we identified nonsynonymous mutations in major DosR regulon genes of 44 L6 genomes of TB patients from The Gambia and Ghana. Using Lebek's test, we assessed differences in oxygen requirements for growth. L4 grew only at the aerobic surface while L6 grew throughout the medium. In the host, the DosR regulon is critical for M. tuberculosis in adaptation to oxygen limitation. However, M. africanum L6 appears to have adapted to growth under hypoxic conditions or to different biological niches. The observed under expression of DosR in L6 fits with the genomic changes in DosR genes, microaerobic growth and the association with extrapulmonary disease.
Abstract Background Viruses are underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which further limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples. Methods To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses. We then integrated the viral classification tree with the NCBI taxonomy for use with ParaKraken (a parallelized version of Kraken), a metagenomic/transcriptomic classifier. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. Results To illustrate the breadth of our utility for classifying viruses with ParaKraken, especially samples without virus-induced pathophysiology, we analyzed data from a plant metagenome study identifying the differences between two Populus genotypes in three different compartments and on a human metatranscriptome study identifying the differences between Autism Spectrum Disorder patients and controls in post mortem brain tissue. In the Populus study, we identified genotype and compartment-specific viral signatures, while in the Autism study we identified a significant increase in abundance of eight viral sequences in post mortem brains. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we utilize the NCBI databases to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples to validate the potential usefulness of classifying viruses. Conclusion Viruses represent an essential component of the microbiome. The ability to classify viruses represents the compulsory first step in better understanding their role in the microbiome. Our viral classification method allows for a more complete identification of viral sequences than previous methods. This will improve identification of associations between viruses and their hosts as well as viruses and other microbiome members and can be used with any tool that utilizes a taxonomy for classification (such as Kraken).
Drought stress is a recurring feature of world climate and the single most important factor influencing agricultural yield worldwide. Plants display highly variable, species-specific responses to drought and these responses are multifaceted, requiring physiological and morphological changes influenced by genetic and molecular mechanisms. Moreover, the reproducibility of water deficit studies is very cumbersome, which significantly impedes research on drought tolerance, because how a plant responds is highly influenced by the timing, duration, and intensity of the water deficit. Despite progress in the identification of drought-related mechanisms in many plants, the molecular basis of drought resistance remains to be fully understood in trees, particularly in poplar species because their wide geographic distribution results in varying tolerances to drought. Herein, we aimed to better understand this complex phenomenon in eastern cottonwood (Populus deltoides) by performing a detailed contrast of the proteome changes between two different water deficit experiments to identify functional intersections and divergences in proteome responses. We investigated plants subjected to cyclic water deficit and compared these responses to plants subjected to prolonged acute water deficit. In total, we identified 108,012 peptide sequences across both experiments that provided insight into the quantitative state of 22,737 Populus gene models and 8,199 functional protein groups in response to drought. Together, these datasets provide the most comprehensive insight into proteome drought responses in poplar to date and a direct proteome comparison between short period dehydration shock and cyclic, post-drought re-watering. Overall, this investigation provides novel insights into drought avoidance mechanisms that are distinct from progressive drought stress. Additionally, we identified proteins that have been associated as drought-relevant in previous studies. Importantly, we highlight the RD26 transcription factor as a gene regulated at both the transcript and protein level, regardless of species and drought condition, and, thus, represents a key, universal drought marker for Populus species.
Plant drought stress causes systematic changes to photosynthesis, metabolism, growth, and potentially the phytobiome. Additionally, drought affects plants in both a species-specific and water-deficit-driven manner, causing the response to drought to be dependent both on how drought is being experienced and on any adaptation to prior drought exposure. Thus, understanding the effect of drought on plants requires assessing drought response in multiple conditions, such as progressive acute drought and recurrent cyclic drought, and at different levels of severity. In this study, we have utilized RNA sequencing to identify changes to the plant transcriptome and the phytobiome during both acute progressive drought and cyclic drought at multiple severities. Co-analysis of the plant and phytobiome, utilizing the same RNAseq data, allows for the identification of novel associations that would not be possible otherwise. We have identified that the drought response ranges from increased transcripts related to photosynthesis and metabolic activity in mild acute drought to decreased transcripts related to photosynthesis and metabolic impairment in severe drought. Moreover, while water deficit is a main driver of transcriptional responses in severe drought, there are increases in reactive oxygen species (ROS) metabolism and photosynthetic transcripts in cyclic severe drought compared with acute severe drought, independent of water deficit. The phytobiome exhibits alternate responses to drought when compared with the transcriptome. Specifically, the phytobiome is affected more by the cyclic or acute nature of the drought rather than the severity of the drought, with the phytobiome having an increase in taxa under cyclic drought that are often reported to have beneficial effects on the plants. Lastly, we have identified associations between taxa in the phytobiome with expression of disease response, ROS metabolism, and photosynthesis transcripts suggesting interplay between the host plant and its phytobiome in response to drought.
Background. Treatment initiation rapidly kills most drug-susceptible Mycobacterium tuberculosis, but a bacterial subpopulation tolerates prolonged drug exposure. We evaluated drug-tolerant bacilli in human sputum by comparing messenger RNA (mRNA) expression of drug-tolerant bacilli that survive the early bactericidal phase with treatment-naive bacilli. Methods. M. tuberculosis gene expression was quantified via reverse-transcription polymerase chain reaction in serial sputa from 17 Ugandans treated for drug-susceptible pulmonary tuberculosis. Results. Within 4 days, bacterial mRNA abundance declined >98%, indicating rapid killing. Thereafter, the rate of decline slowed >94%, indicating drug tolerance. After 14 days, 16S ribosomal RNA transcripts/genome declined 96%, indicating slow growth. Drug-tolerant bacilli displayed marked downregulation of genes associated with growth, metabolism, and lipid synthesis and upregulation in stress responses and key regulatory categories—including stress-associated sigma factors, transcription factors, and toxin-antitoxin genes. Drug efflux pumps were upregulated. The isoniazid stress signature was induced by initial drug exposure, then disappeared after 4 days. Conclusions. Transcriptional patterns suggest that drug-tolerant bacilli in sputum are in a slow-growing, metabolically and synthetically downregulated state. Absence of the isoniazid stress signature in drug-tolerant bacilli indicates that physiological state influences drug responsiveness in vivo. These results identify novel drug targets that should aid in development of novel shorter tuberculosis treatment regimens.