Abstract Background Short tandem repeats (STRs) are widely distributed across the human genome and are associated with numerous neurological disorders. However, the extent that STRs contribute to disease is likely under-estimated because of the challenges calling these variants in short read next generation sequencing data. Several computational tools have been developed for STR variant calling, but none fully address all of the complexities associated with this variant class. Results Here we introduce LUSTR which is designed to address some of the challenges associated with STR variant calling by enabling more flexibility in defining STR loci, allowing for customizable modules to tailor analyses, and expanding the capability to call somatic and multiallelic STR variants. LUSTR is a user-friendly and easily customizable tool for targeted or unbiased genome-wide STR variant screening that can use either predefined or novel genome builds. Using both simulated and real data sets, we demonstrated that LUSTR accurately infers germline and somatic STR expansions in individuals with and without diseases. Conclusions LUSTR offers a powerful and user-friendly approach that allows for the identification of STR variants and can facilitate more comprehensive studies evaluating the role of pathogenic STR variants across human diseases.
Abstract Gene losses provide an insightful route for studying the morphological and physiological adaptations of species, but their discovery is challenging. Existing genome annotation tools focus on annotating intact genes and do not attempt to distinguish nonfunctional genes from genes missing annotation due to sequencing and assembly artifacts. Previous attempts to annotate gene losses have required significant manual curation, which hampers their scalability for the ever-increasing deluge of newly sequenced genomes. Using extreme sequence erosion (amino acid deletions and substitutions) and sister species support as an unambiguous signature of loss, we developed an automated approach for detecting high-confidence gene loss events across a species tree. Our approach relies solely on gene annotation in a single reference genome, raw assemblies for the remaining species to analyze, and the associated phylogenetic tree for all organisms involved. Using human as reference, we discovered over 400 unique human ortholog erosion events across 58 mammals. This includes dozens of clade-specific losses of genes that result in early mouse lethality or are associated with severe human congenital diseases. Our discoveries yield intriguing potential for translational medical genetics and evolutionary biology, and our approach is readily applicable to large-scale genome sequencing efforts across the tree of life.
Mutations of genes within the phosphatidylinositol-3-kinase (PI3K)-AKT-MTOR pathway are well known causes of brain overgrowth (megalencephaly) as well as segmental cortical dysplasia (such as hemimegalencephaly, focal cortical dysplasia and polymicrogyria). Mutations of the AKT3 gene have been reported in a few individuals with brain malformations, to date. Therefore, our understanding regarding the clinical and molecular spectrum associated with mutations of this critical gene is limited, with no clear genotype–phenotype correlations. We sought to further delineate this spectrum, study levels of mosaicism and identify genotype–phenotype correlations of AKT3-related disorders. We performed targeted sequencing of AKT3 on individuals with these phenotypes by molecular inversion probes and/or Sanger sequencing to determine the type and level of mosaicism of mutations. We analysed all clinical and brain imaging data of mutation-positive individuals including neuropathological analysis in one instance. We performed ex vivo kinase assays on AKT3 engineered with the patient mutations and examined the phospholipid binding profile of pleckstrin homology domain localizing mutations. We identified 14 new individuals with AKT3 mutations with several phenotypes dependent on the type of mutation and level of mosaicism. Our comprehensive clinical characterization, and review of all previously published patients, broadly segregates individuals with AKT3 mutations into two groups: patients with highly asymmetric cortical dysplasia caused by the common p.E17K mutation, and patients with constitutional AKT3 mutations exhibiting more variable phenotypes including bilateral cortical malformations, polymicrogyria, periventricular nodular heterotopia and diffuse megalencephaly without cortical dysplasia. All mutations increased kinase activity, and pleckstrin homology domain mutants exhibited enhanced phospholipid binding. Overall, our study shows that activating mutations of the critical AKT3 gene are associated with a wide spectrum of brain involvement ranging from focal or segmental brain malformations (such as hemimegalencephaly and polymicrogyria) predominantly due to mosaic AKT3 mutations, to diffuse bilateral cortical malformations, megalencephaly and heterotopia due to constitutional AKT3 mutations. We also provide the first detailed neuropathological examination of a child with extreme megalencephaly due to a constitutional AKT3 mutation. This child has one of the largest documented paediatric brain sizes, to our knowledge. Finally, our data show that constitutional AKT3 mutations are associated with megalencephaly, with or without autism, similar to PTEN-related disorders. Recognition of this broad clinical and molecular spectrum of AKT3 mutations is important for providing early diagnosis and appropriate management of affected individuals, and will facilitate targeted design of future human clinical trials using PI3K-AKT pathway inhibitors.
Abstract In an age where commercial entities are allowed to collect and directly profit from large amounts of private information, an age where large data breaches of such organizations are discovered every month, science must strive to offer society viable ways to preserve privacy while benefitting from the power of data sharing. Patient phenotypes and genotypes are critical for building groups of phenotypically-similar patients, identify the gene that best explains their common phenotypes, and ultimately, diagnose a patient with a Mendelian disease. Direct computation over these quantities requires highly-sensitive patient data to be shared openly, compromising patient privacy and opening patients up for discrimination. Existing protocols focus on secure computation over genotype data and only address the final steps of the disease-diagnosis pipeline where phenotypically-similar patients have been identified. However, identifying such patients in a secure and private manner remains open. In this work, we develop secure protocols to maintain patient privacy while computing meaningful operations over both genotypic and phenotypic data for two real scenarios: COHORT DISCOVERY and GENE PRIORITIZATION. Our protocols newly enable a complete and secure end-to-end disease diagnosis pipeline that protects sensitive patient phenotypic and genotypic data.
Genetic studies have identified a core set of transcription factors and target genes that control the development of the neocortex, the region of the human brain responsible for higher cognition. The specific regulatory interactions between these factors, many key upstream and downstream genes, and the enhancers that mediate all these interactions remain mostly uncharacterized. We perform p300 ChIP-seq to identify over 6,600 candidate enhancers active in the dorsal cerebral wall of embryonic day 14.5 (E14.5) mice. Over 95% of the peaks we measure are conserved to human. Eight of ten (80%) candidates tested using mouse transgenesis drive activity in restricted laminar patterns within the neocortex. GREAT based computational analysis reveals highly significant correlation with genes expressed at E14.5 in key areas for neocortex development, and allows the grouping of enhancers by known biological functions and pathways for further studies. We find that multiple genes are flanked by dozens of candidate enhancers each, including well-known key neocortical genes as well as suspected and novel genes. Nearly a quarter of our candidate enhancers are conserved well beyond mammals. Human and zebrafish regions orthologous to our candidate enhancers are shown to most often function in other aspects of central nervous system development. Finally, we find strong evidence that specific interspersed repeat families have contributed potentially key developmental enhancers via co-option. Our analysis expands the methodologies available for extracting the richness of information found in genome-wide functional maps.
Diphthamide is a post-translationally modified histidine essential for messenger RNA translation and ribosomal protein synthesis. We present evidence for DPH5 as a novel cause of embryonic lethality and profound neurodevelopmental delays (NDDs).Molecular testing was performed using exome or genome sequencing. A targeted Dph5 knockin mouse (C57BL/6Ncrl-Dph5em1Mbp/Mmucd) was created for a DPH5 p.His260Arg homozygous variant identified in 1 family. Adenosine diphosphate-ribosylation assays in DPH5-knockout human and yeast cells and in silico modeling were performed for the identified DPH5 potential pathogenic variants.DPH5 variants p.His260Arg (homozygous), p.Asn110Ser and p.Arg207Ter (heterozygous), and p.Asn174LysfsTer10 (homozygous) were identified in 3 unrelated families with distinct overlapping craniofacial features, profound NDDs, multisystem abnormalities, and miscarriages. Dph5 p.His260Arg homozygous knockin was embryonically lethal with only 1 subviable mouse exhibiting impaired growth, craniofacial dysmorphology, and multisystem dysfunction recapitulating the human phenotype. Adenosine diphosphate-ribosylation assays showed absent to decreased function in DPH5-knockout human and yeast cells. In silico modeling of the variants showed altered DPH5 structure and disruption of its interaction with eEF2.We provide strong clinical, biochemical, and functional evidence for DPH5 as a novel cause of embryonic lethality or profound NDDs with multisystem involvement and expand diphthamide-deficiency syndromes and ribosomopathies.
Distantly related species entering similar biological niches often adapt by evolving similar morphological and physiological characters. How much genomic molecular convergence (particularly of highly constrained coding sequence) contributes to convergent phenotypic evolution, such as echolocation in bats and whales, is a long-standing fundamental question. Like others, we find that convergent amino acid substitutions are not more abundant in echolocating mammals compared to their outgroups. However, we also ask a more informative question about the genomic distribution of convergent substitutions by devising a test to determine which, if any, of more than 4,000 tissue-affecting gene sets is most statistically enriched with convergent substitutions. We find that the gene set most overrepresented ( q -value = 2.2e-3) with convergent substitutions in echolocators, affecting 18 genes, regulates development of the cochlear ganglion, a structure with empirically supported relevance to echolocation. Conversely, when comparing to nonecholocating outgroups, no significant gene set enrichment exists. For aquatic and high-altitude mammals, our analysis highlights 15 and 16 genes from the gene sets most affected by molecular convergence which regulate skin and lung physiology, respectively. Importantly, our test requires that the most convergence-enriched set cannot also be enriched for divergent substitutions, such as in the pattern produced by inactivated vision genes in subterranean mammals. Showing a clear role for adaptive protein-coding molecular convergence, we discover nearly 2,600 convergent positions, highlight 77 of them in 3 organs, and provide code to investigate other clades across the tree of life.