Summary Ku70 and Ku80 form Ku, a ring-shaped protein that initiates the non-homologous end-joining (NHEJ) DNA repair pathway. 1 Specifically, Ku binds to double-stranded DNA (dsDNA) ends and recruits other NHEJ factors ( e.g. , DNA-PKcs and LIG4). While Ku binds to double-stranded RNA (dsRNA) 2 and traps mutated-DNA-PKcs on ribosomal RNA in vivo, 3,4 the physiological significance of Ku-dsRNA interactions in otherwise wild-type cells remains elusive. Intriguingly, while dispensable for murine development, 5,6 Ku is essential in human cells. 7 Despite similar genome sizes, human cells express ∼100-fold more Ku than mouse cells, implying functions beyond NHEJ, possibly through a dose-sensitive interaction with dsRNA, which is ∼100 times weaker than with dsDNA. 2,8 While investigating the essentiality of Ku in human cells, we found that depletion of Ku - unlike LIG4 - induces profound interferon (IFN) and NF-kB responses reliant on the dsRNA-sensor MDA5/RIG-I and adaptor MAVS. Prolonged Ku-degradation also activates other dsRNA-sensors, e.g. PKR that suppresses protein translation, and OAS/RNaseL that cleaves rRNAs and eventually induces growth arrest and cell death. MAVS, RIG-I, or MDA5 knockouts suppressed IFN signaling and, together with PKR knockouts, partially rescued Ku-depleted human cells. Ku-irCLIP analyses revealed that Ku binds to diverse dsRNA, predominantly stem-loops in primate-specific Alu elements 9 at anti-sense orientation in introns and 3’-UTRs. Ku expression rose sharply in higher primates tightly correlating with Alu-expansion (r = 0.94/0.95). Together, our study identified a vital role of Ku in accommodating Alu-expansion in primates by mitigating a dsRNA-induced innate immune response, explaining the rise of Ku levels and its essentiality in human cells.
The DNA-binding AT-rich interactive domain (ARID) exists in a wide range of proteins throughout eukaryotic kingdoms. ARID domain-containing proteins are involved in manifold biological processes, such as transcriptional regulation, cell cycle control and chromatin remodeling. Their individual domain composition allows for a sub-classification within higher mammals. ARID is categorized as binder of double-stranded AT-rich DNA, while recent work has suggested ARIDs as capable of binding other DNA motifs and also recognizing RNA. Despite a broad variability on the primary sequence level, ARIDs show a highly conserved fold, which consists of six α-helices and two loop regions. Interestingly, this minimal core domain is often found extended by helices at the N- and/or C-terminus with potential roles in target specificity and, subsequently function. While high-resolution structural information from various types of ARIDs has accumulated over two decades now, there is limited access to ARID-DNA complex structures. We thus find ourselves left at the beginning of understanding ARID domain target specificities and the role of accompanying domains. Here, we systematically summarize ARID domain conservation and compare the various types with a focus on their structural differences and DNA-binding preferences, including the context of multiple other motifs within ARID domain containing proteins.
Abstract AT-rich interacting domain (ARID)-containing proteins, Arids, are a heterogeneous DNA-binding protein family involved in transcription regulation and chromatin processing. For the member Arid5a, no exact DNA-binding preference has been experimentally defined so far. Additionally, the protein binds to mRNA motifs for transcript stabilization, supposedly through the DNA-binding ARID domain. To date, however, no unbiased RNA motif definition and clear dissection of nucleic acid-binding through the ARID domain have been undertaken. Using NMR-centered biochemistry, we here define the Arid5a DNA preference. Further, high-throughput in vitro binding (RBNS) reveals a consensus RNA-binding motif engaged by the core ARID domain. Finally, transcriptome-wide binding (iCLIP2) reveals that Arid5a has a weak preference for (A)U-rich regions in pre-mRNA transcripts of factors related to RNA processing. We find that the intrinsically disordered regions (IDR) flanking the ARID domain modulate the specificity and affinity of DNA-binding, while they appear crucial for RNA interactions. Ultimately, our data suggest that Arid5a uses its extended ARID domain for bi-functional gene regulation and that the involvement of IDR extensions is a more general feature of Arids in interacting with different nucleic acids at the chromatin-mRNA interface.
Abstract The nucleocapsid protein (N) of SARS-CoV-2 plays a pivotal role during the viral life cycle. It is involved in RNA transcription and accounts for packaging of the large genome into virus particles. N manages the enigmatic balance of bulk RNA-coating versus precise RNA-binding to designated cis-regulatory elements. Numerous studies report the involvement of its disordered segments in non-selective RNA-recognition, but how N organizes the inevitable recognition of specific motifs remains unanswered. We here use NMR spectroscopy to systematically analyze the interactions of N’s N-terminal RNA-binding domain (NTD) with individual cis RNA elements clustering in the SARS-CoV-2 regulatory 5’-genomic end. Supported by broad solution-based biophysical data, we unravel the NTD RNA-binding preferences in the natural genome context. We show that the domain uses a set of flexible sensory residues to read the intrinsic signature of preferred RNA elements for selective and stable complex formation within the large pool of available motifs.
Abstract The family of scaffold attachment factor B (SAFB) proteins comprises three members and was first identified as binders of the nuclear matrix/scaffold. Over the past two decades, SAFBs were shown to act in DNA repair, mRNA/(l)ncRNA processing, and as part of protein complexes with chromatin-modifying enzymes. SAFB proteins are approximately-100-kDa-sized dual nucleic acid-binding proteins with dedicated domains in an otherwise largely unstructured context, but whether and how they discriminate DNA- and RNA-binding has remained enigmatic. We here provide the SAFB2 DNA- and RNA-binding SAP and RRM domains in their functional boundaries and use solution NMR spectroscopy to ascribe DNA- and RNA-binding functions. We give insight into their target nucleic acid preferences and map the interfaces with respective nucleic acids on sparse data-derived SAP and RRM domain structures. Further, we provide evidence that the SAP domain exhibits intra-domain dynamics and a potential tendency to dimerise, which may expand its specifically targeted DNA sequence range. Our data provide a first molecular basis of and a starting point towards deciphering DNA- and RNA-binding functions of SAFB2 on the molecular level and serve a basis for understanding its localization to specific regions of chromatin and its involvement in the processing of specific RNA species.
SARS-CoV-2 (SCoV2) and its variants of concern pose serious challenges to the public health. The variants increased challenges to vaccines, thus necessitating for development of new intervention strategies including anti-virals. Within the international Covid19-NMR consortium, we have identified binders targeting the RNA genome of SCoV2. We established protocols for the production and NMR characterization of more than 80 % of all SCoV2 proteins. Here, we performed an NMR screening using a fragment library for binding to 25 SCoV2 proteins and identified hits also against previously unexplored SCoV2 proteins. Computational mapping was used to predict binding sites and identify functional moieties (chemotypes) of the ligands occupying these pockets. Striking consensus was observed between NMR-detected binding sites of the main protease and the computational procedure. Our investigation provides novel structural and chemical space for structure-based drug design against the SCoV2 proteome.
The current outbreak of the highly infectious COVID-19 respiratory disease is caused by the novel coronavirus SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2). To fight the pandemic, the search for promising viral drug targets has become a cross-border common goal of the international biomedical research community. Within the international Covid19-NMR consortium, scientists support drug development against SARS-CoV-2 by providing publicly available NMR data on viral proteins and RNAs. The coronavirus nucleocapsid protein (N protein) is an RNA-binding protein involved in viral transcription and replication. Its primary function is the packaging of the viral RNA genome. The highly conserved architecture of the coronavirus N protein consists of an N-terminal RNA-binding domain (NTD), followed by an intrinsically disordered Serine/Arginine (SR)-rich linker and a C-terminal dimerization domain (CTD). Besides its involvement in oligomerization, the CTD of the N protein (N-CTD) is also able to bind to nucleic acids by itself, independent of the NTD. Here, we report the near-complete NMR backbone chemical shift assignments of the SARS-CoV-2 N-CTD to provide the basis for downstream applications, in particular site-resolved drug binding studies.