Abstract Gene losses provide an insightful route for studying the morphological and physiological adaptations of species, but their discovery is challenging. Existing genome annotation tools focus on annotating intact genes and do not attempt to distinguish nonfunctional genes from genes missing annotation due to sequencing and assembly artifacts. Previous attempts to annotate gene losses have required significant manual curation, which hampers their scalability for the ever-increasing deluge of newly sequenced genomes. Using extreme sequence erosion (amino acid deletions and substitutions) and sister species support as an unambiguous signature of loss, we developed an automated approach for detecting high-confidence gene loss events across a species tree. Our approach relies solely on gene annotation in a single reference genome, raw assemblies for the remaining species to analyze, and the associated phylogenetic tree for all organisms involved. Using human as reference, we discovered over 400 unique human ortholog erosion events across 58 mammals. This includes dozens of clade-specific losses of genes that result in early mouse lethality or are associated with severe human congenital diseases. Our discoveries yield intriguing potential for translational medical genetics and evolutionary biology, and our approach is readily applicable to large-scale genome sequencing efforts across the tree of life.
A fundamental step in gene-regulatory activities, such as repression, transcription, and recombination, is the binding of regulatory DNA-binding proteins (DBPs) to specific targets in the genome. To rapidly localize their regulatory genomic sites, DBPs reduce the dimensionality of the search space by combining three-dimensional (3D) diffusion in solution with one-dimensional (1D) sliding along DNA. However, the requirement to form a thermodynamically stable protein–DNA complex at the cognate genomic target sequence imposes a challenge on the protein because, as it navigates one-dimensionally along the genome, it may come in close contact with sites that share partial or even complete sequence similarity with the functional DNA sequence. This puzzling issue creates a conflict between two basic requirements: finding the cognate site quickly and stably binding it. Here, we structurally assessed the interface adopted by a variety of DBPs to bind DNA specifically and nonspecifically, and found that many DBPs utilize one interface to specifically recognize a DNA sequence and another to assist in propagating along the DNA through nonspecific associations. While these two interfaces overlap each other in some proteins, they present partial overlap in others and frustrate the protein–DNA interface. Using coarse-grained molecular dynamics simulations, we demonstrate that the existence of frustration in DBPs is a compromise between rapid 1D diffusion along other regions in the genome (high frustration smoothens the landscape for sliding) and rapid formation of a stable and essentially active protein–DNA complex (low frustration reduces the free energy barrier for switching between the two binding modes).
Distantly related species entering similar biological niches often adapt by evolving similar morphological and physiological characters. How much genomic molecular convergence (particularly of highly constrained coding sequence) contributes to convergent phenotypic evolution, such as echolocation in bats and whales, is a long-standing fundamental question. Like others, we find that convergent amino acid substitutions are not more abundant in echolocating mammals compared to their outgroups. However, we also ask a more informative question about the genomic distribution of convergent substitutions by devising a test to determine which, if any, of more than 4,000 tissue-affecting gene sets is most statistically enriched with convergent substitutions. We find that the gene set most overrepresented ( q -value = 2.2e-3) with convergent substitutions in echolocators, affecting 18 genes, regulates development of the cochlear ganglion, a structure with empirically supported relevance to echolocation. Conversely, when comparing to nonecholocating outgroups, no significant gene set enrichment exists. For aquatic and high-altitude mammals, our analysis highlights 15 and 16 genes from the gene sets most affected by molecular convergence which regulate skin and lung physiology, respectively. Importantly, our test requires that the most convergence-enriched set cannot also be enriched for divergent substitutions, such as in the pattern produced by inactivated vision genes in subterranean mammals. Showing a clear role for adaptive protein-coding molecular convergence, we discover nearly 2,600 convergent positions, highlight 77 of them in 3 organs, and provide code to investigate other clades across the tree of life.
Abstract Introduction: We present the comprehensive genomic profiling performance of the Ion Torrent Genexus system using the Oncomine Comprehensive Assay Plus (OCA Plus), a 500+ gene targeted AmpliSeq-based oncology research panel that evaluates DNA variants (including copy number alterations), RNA fusions, and key oncology research endpoints including tumor mutational burden (TMB), microsatellite instability (MSI), and homologous recombination repair deficiency (HRD) via characterization of genomic instability by the newly introduced Genomic Instability Metric (GIM). Methods: The Ion Torrent Genexus System provides comprehensive genomic profiling via automated sample-to-report workflow with next day results. The Genexus System supports oncology research panels such as OCA Plus, which is comprised of over 13,000 amplicons, and enables low input requirements of just 20ng of FFPE DNA and RNA. This study utilized cell lines, reference controls, and orthogonally tested FFPE research samples to evaluate detection of DNA variants, copy number alterations, RNA fusions, and key research endpoints, including MSI, TMB, and HRD. The OCA Plus panel was also evaluated for the ability to detect arm-level copy number changes in orthogonally validated FFPE samples. Results: Commercial reference controls and FFPE research samples were sequenced using OCA Plus on the Genexus System to an average depth of ≥24 million reads per sample, with four DNA and RNA samples supported per run. SNV and MNV calling performance was assessed using the AcroMetrix Oncology Hotspot Control which has 377 variants covered by OCA Plus and delivered SNV sensitivity and PPV >99% and MNV sensitivity of >99% and PPV >95%. MSI status was assessed using orthogonally tested FFPE samples from various tumor tissues (stomach, endometrial, colorectal) and returned status concordance of 99.4% with sensitivity and PPV >99%. The TMB endpoint was tested using commercial controls and FFPE samples with a correlation of r2 > 0.90 to orthogonal measurements. RNA Fusion reference controls showed 100% positive correlation. Copy number gain detection shows sensitivity of 99% and PPV >95%, while homologous copy loss gives 100% PPV and sensitivity of >90%. We also demonstrate high concordance to orthogonal methods in detection of HER2 amplifications, and the ability to detect arm-level copy number alterations such as 1p/19q co-deletions in IDH1 positive glioma samples. Conclusion: The Genexus System combines minimal touch points and a rapid turnaround time to enable comprehensive genomic profiling for research assays such as OCA Plus for detection of rare variants and low-level fusion transcripts. Further, by providing accurate characterization of key oncology research endpoints, the Genexus System can accelerate research in oncology. For research use only. Not for use in diagnostic procedures. Citation Format: Geoffrey Marc Lowman, Dinesh Cyanam, Emily Norris, Michelle Toro, Coleen Nemes, Tanaya Puranik, Yan Zhu, Alex Phan, Derek Wong, Portia Bernado, Anelia Kralcheva, Srinivas Nallandhighal, Loni Pickle, April Bigley, Mohit Gupta, Ying Jin, Sameh El-Difrawy, Amir Marcovitz, Fatima Zare, Charles Scafe, Yu-Ting Tseng, Jianjun Guo, Vinay Mittal, Scott Myrand, Santhoshi Bandla, Paul Williams, Eugene Ingerman, Elaine Wong-Ho, Seth Sadis, Mark Andersen, Rob Bennett. Fully automated comprehensive genomic profiling for detection of cancer variants, gene fusions, and complex oncology endpoints [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 232.
Water molecules are abundant in protein–DNA interfaces, especially in their nonspecific complexes. In this study, we investigated the organization and energetics of the interfacial water by simplifying the geometries of the proteins and the DNA to represent them as two equally and oppositely charged planar surfaces immersed in water. We found that the potential of mean force for bringing the two parallel surfaces into close proximity comprises energetic barriers whose properties strongly depend on the charge density of the surfaces. We demonstrated how the organization of the water molecules into discretized layers and the corresponding energetic barriers to dehydration can be modulated by the charge density on the surfaces, salt, and the structure of the surfaces. The 1–2 layers of ordered water are tightly bound to the charged surfaces representing the nonspecific protein–DNA complex. This suggests that water might mediate one-dimensional diffusion of proteins along DNA (sliding) by screening attractive electrostatic interactions between the positively charged molecular surface on the protein and the negatively charged DNA backbone and, in doing so, reduce intermolecular friction in a manner that smoothens the energetic landscape for sliding, and facilitates the 1D diffusion of the protein.
Abstract Introduction: We describe the development and performance of a new sample-to-report targeted sequencing solution for testing solid tissue cancers using the Genexus Integrated Sequencing System and accompanying software. The assay is designed for research applications from either formalin fixed paraffin embedded (FFPE) solid tumor samples or cell-free total nucleic acid (cfTNA) from liquid biopsy samples. The Genexus Integrated Sequencer is a fully automated system requiring minimal touch points and hands on time allowing a novice user to go from nucleic acid to variant calls for somatic variant testing across multiple cancer types in less than two days. Methods: The Oncomine Precision Assay is a new amplicon-based assay targeting specific somatic variants in 50 genes with coverage for multiple cancer types. The assay uses AmpliSeq HD chemistry capable of distinguishing true sample biological variants from errors generated during library preparation, templating, and sequencing through incorporation of molecular tags during target amplification. With about 15 minutes of hands on time, a run is set-up using pre-filled reagent strips for a fully automated run that includes library prep, templating, sequencing, variant calling, and a final report if desired. Results: Reported here are the results generated from an early external test site along with development data. The Oncomine Precision Assay is designed to detect somatic variants in 50 unique genes testing all major variant types important in the oncology research. The content was selected based on published accounts of target actionability and prevalence across multiple cancer types. All major variant types are targeted including SNVs, insertions, deletions, copy number variation, fusion transcripts and alternate splice forms. Data is shared from an external test lab using the Oncomine Precision Assay on the Genexus Integrated Sequencer with control and research samples. Results from both multiplexed FFPE and liquid biopsy runs are presented. Data demonstrates use of a single assay and system to effectively call variants from both FFPE and liquid biopsy sample types with a turn-around time of less than 30 hours. Conclusion: The Oncomine Precision Assay and Genexus Integrated Sequencer enable detection of key oncology variants in 50 genes using either solid tissue or liquid biopsy samples as input. This fully automated solution for oncology research generates variant calls from nucleic acid input in less than 2 days with minimal hands-on time and touch-points from the user. Many features of the automated system increase success rates by ensuring the appropriate reagents are properly installed before a run. The user friendly, highly automated, and fully optimized sample to answer system described here has great potential for targeted oncology sequencing in the research setting. Citation Format: Jian Gu, Ru Cao, Jeff Schageman, Kris Lea, Priyanka Kshatriya, Amir Marcovitz, Paul Williams, Rasika Sunnadeniya, Varun Bagai, Khalid Hanif, Jose L. Costa, Kelli Bramlett. Demonstration of the genexus integrated sequencing system with the oncomine precision assay [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 213.
Significance Everybody loves dolphins. And orcas. And Mx (Myxovirus) genes. Mx genes are important immune genes that help mammals fight many RNA and DNA viruses, including HIV, measles, and flu. We make a surprising discovery: dolphins, orcas, and likely all toothed whales lost both Mx genes soon after they diverged from baleen whales and ungulates, which preserve these important genes intact. Because both genes were likely lost simultaneously, we speculate that a viral outbreak exploiting the Mx genes may have forced the toothed whale’s ancestor to sacrifice both. Because the Mx genes are so important, and because all 56 nontoothed whale sequenced mammals carry Mx genes, our discovery makes an important contribution to help preserve these magnificent mammals.