ABSTRACT Nuclear speckles are self-assembled organelles composed of RNAs and proteins. They are proposed to act as structural domains that control distinct steps in gene expression, including transcription, splicing and mRNA export. Earlier studies identified differential localization of a few components within the speckles. It was speculated that the spatial organization of speckle components might contribute directly to the order of operations that coordinate distinct processes. Here, by performing multi-color structured illumination microscopy, we characterized the multilayer organization of speckles at a higher resolution. We found that SON and SC35 (also known as SRSF2) localize to the central region of the speckle, whereas MALAT1 and small nuclear (sn)RNAs are enriched at the speckle periphery. Coarse-grained simulations indicate that the non-random organization arises due to the interplay between favorable sequence-encoded intermolecular interactions of speckle-resident proteins and RNAs. Finally, we observe positive correlation between the total amount of RNA present within a speckle and the speckle size. These results imply that speckle size may be regulated to accommodate RNA accumulation and processing. Accumulation of RNA from various actively transcribed speckle-associated genes could contribute to the observed speckle size variations within a single cell.
The conformational ensemble and function of intrinsically disordered proteins (IDPs) are sensitive to their solution environment. The inherent malleability of disordered proteins, combined with the exposure of their residues, accounts for this sensitivity. One context in which IDPs play important roles that are concomitant with massive changes to the intracellular environment is during desiccation (extreme drying). The ability of organisms to survive desiccation has long been linked to the accumulation of high levels of cosolutes such as trehalose or sucrose as well as the enrichment of IDPs, such as late embryogenesis abundant (LEA) proteins or cytoplasmic abundant heat-soluble (CAHS) proteins. Despite knowing that IDPs play important roles and are co-enriched alongside endogenous, species-specific cosolutes during desiccation, little is known mechanistically about how IDP-cosolute interactions influence desiccation tolerance. Here, we test the notion that the protective function of desiccation-related IDPs is enhanced through conformational changes induced by endogenous cosolutes. We find that desiccation-related IDPs derived from four different organisms spanning two LEA protein families and the CAHS protein family synergize best with endogenous cosolutes during drying to promote desiccation protection. Yet the structural parameters of protective IDPs do not correlate with synergy for either CAHS or LEA proteins. We further demonstrate that for CAHS, but not LEA proteins, synergy is related to self-assembly and the formation of a gel. Our results suggest that functional synergy between IDPs and endogenous cosolutes is a convergent desiccation protection strategy seen among different IDP families and organisms, yet the mechanisms underlying this synergy differ between IDP families.
Abstract Cone-Rod Homeobox, encoded by CRX , is a transcription factor (TF) essential for the terminal differentiation and maintenance of mammalian photoreceptors. Structurally, CRX comprises an ordered DNA-binding homeodomain and an intrinsically disordered transcriptional effector domain. Although a handful of human variants in CRX have been shown to cause several different degenerative retinopathies with varying cone and rod predominance, as with most human disease genes the vast majority of observed CRX genetic variants are uncharacterized variants of uncertain significance (VUS). We performed a deep mutational scan (DMS) of nearly all possible single amino acid substitution variants in CRX, using an engineered cell-based transcriptional reporter assay. We measured the ability of each CRX missense variant to transactivate a synthetic fluorescent reporter construct in a pooled fluorescence-activated cell sorting assay and compared the activation strength of each variant to that of wild-type CRX to compute an activity score, identifying thousands of variants with altered transcriptional activity. We calculated a statistical confidence for each activity score derived from multiple independent measurements of each variant marked by unique sequence barcodes, curating a high-confidence list of nearly 2,000 variants with significantly altered transcriptional activity compared to wild-type CRX. We evaluated the performance of the DMS assay as a clinical variant classification tool using gold-standard classified human variants from ClinVar, and determined that activity scores could be used to identify pathogenic variants with high specificity. That this performance could be achieved using a synthetic reporter assay in a foreign cell type, even for a highly cell type-specific TF like CRX, suggests that this approach shows promise for DMS of other TFs that function in cell types that are not easily accessible. Per-position average activity scores closely aligned to a predicted structure of the ordered homeodomain and demonstrated position-specific residue requirements. The intrinsically disordered transcriptional effector domain, by contrast, displayed a qualitatively different pattern of substitution effects, following compositional constraints without specific residue position requirements in the peptide chain. The observed compositional constraints of the effector domain were consistent with the acidic exposure model of transcriptional activation. Together, the results of the CRX DMS identify molecular features of the CRX effector domain and demonstrate clinical utility for variant classification.
Abstract Low-complexity protein domains promote the formation of various biomolecular condensates. However, in many cases, the precise sequence features governing condensate formation and identity remain unclear. Here, we investigate the role of intrinsically disordered mixed-charge domains (MCDs) in nuclear speckle condensation. Proteins composed exclusively of arginine/aspartic-acid dipeptide repeats undergo length-dependent condensation and speckle incorporation. Substituting arginine with lysine in synthetic and natural speckle-associated MCDs abolishes these activities, identifying a key role for multivalent contacts through arginine’s guanidinium ion. MCDs can synergise with a speckle-associated RNA recognition motif to promote speckle specificity and residence. MCD behaviour is tuneable through net-charge: increasing negative charge abolishes condensation and speckle incorporation. By contrast, increasing positive charge through arginine leads to enhanced condensation, speckle enlargement, decreased splicing factor mobility, and defective mRNA export. Together, these results identify key sequence determinants of MCD-promoted speckle condensation, and link the speckle’s dynamic material properties with function in mRNA processing.
The emergence of high-throughput experiments and high-resolution computational predictions has led to an explosion in the quality and volume of protein sequence annotations at proteomic scales. Unfortunately, sanity checking, integrating, and analyzing complex sequence annotations remains logistically challenging and introduces a major barrier to entry for even superficial integrative bioinformatics.To address this technical burden, we have developed SHEPHARD, a Python framework that trivializes large-scale integrative protein bioinformatics. SHEPHARD combines an object-oriented hierarchical data structure with database-like features, enabling programmatic annotation, integration, and analysis of complex datatypes. Importantly SHEPHARD is easy to use and enables a Pythonic interrogation of largescale protein datasets with millions of unique annotations. We use SHEPHARD to examine three orthogonal proteome-wide questions relating protein sequence to molecular function, illustrating its ability to uncover novel biology.We provided SHEPHARD as both a stand-alone software package (https://github.com/holehouse-lab/shephard), and as a Google Colab notebook with a collection of precomputed proteome-wide annotations (https://github.com/holehouse-lab/shephard-colab).
Abstract Background Biomolecular condensates are non-stoichiometric assemblies that are characterized by their capacity to spatially concentrate biomolecules and play a key role in cellular organization. Proteins that drive the formation of biomolecular condensates frequently contain oligomerization domains and intrinsically disordered regions (IDRs), both of which can contribute multivalent interactions that drive higher-order assembly. Our understanding of the relative and temporal contribution of oligomerization domains and IDRs to the material properties of in vivo biomolecular condensates is limited. Similarly, the spatial and temporal dependence of protein oligomeric state inside condensates has been largely unexplored in vivo. Methods In this study, we combined quantitative microscopy with number and brightness analysis to investigate the aging, material properties, and protein oligomeric state of biomolecular condensates in vivo. Our work is focused on condensates formed by AUXIN RESPONSE FACTOR 19 (ARF19), a transcription factor integral to the auxin signaling pathway in plants. ARF19 contains a large central glutamine-rich IDR and a C-terminal Phox Bem1 (PB1) oligomerization domain and forms cytoplasmic condensates. Results Our results reveal that the IDR amino acid composition can influence the morphology and material properties of ARF19 condensates. In contrast the distribution of oligomeric species within condensates appears insensitive to the IDR composition. In addition, we identified a relationship between the abundance of higher- and lower-order oligomers within individual condensates and their apparent fluidity. Conclusions IDR amino acid composition affects condensate morphology and material properties. In ARF condensates, altering the amino acid composition of the IDR did not greatly affect the oligomeric state of proteins within the condensate.