Birth of a new distinct phase is a phenomenon encountered in a myriad of processes, and has wide ranging consequences in material processing, biological self-assembly, separations and several other processes. Several phase transitions are nucleation driven. The nucleation events occur over nanosecond timescales and involve hundreds to thousands of molecules. These length and timescales are difficult to access in experiments, thereby making experimental studies of nucleation challenging. On the other hand, molecular simulations sample the nanosecond and nanometer scales making them ideal to study nucleation. However, nucleation is a rare event, meaning that the waiting time to observe one nucleation event is significant. This makes simulation studies of rare events challenging. The project focused on a multi-pronged approach to address such challenges to develop the next generation rare event sampling methods for molecular simulations. The key outcomes of our work include developing more effective methods for sampling rare events, utilizing machine learning to better elucidate nucleation mechanisms, development of software for easy implementation of the methodologies, and applications of the methods to realistic systems to push the method applicability beyond model systems. Overall, this work has enabled pushing the frontiers of molecular simulations to study rare events with a focus on nucleation in aqueous solutions.
We compare the free energies of adsorption (∆Aads) and the structural preferences of amino acids obtained using the force fields — Amberff99SB-ILDN/TIP3P, CHARMM36/modified-TIP3P, OPLS-AA/M/TIP3P, and Amber03w/TIP4P/2005. The amino acid–graphene interactions are favorable irrespective of the force field. While the magnitudes of ∆Aads differ between the force fields, the trends in the free energy of adsorption with amino acids are similar across the studied force fields. ∆Aads positively correlates with amino acid–graphene and negatively correlates with graphene–water interaction energies. Using a combination of principal component analysis and density-based clustering technique, we grouped the structures observed in the graphene adsorbed state. The resulting population of clusters, and the conformation in each cluster indicate that the structures of the amino acid in the graphene adsorbed state vary across force fields. The differences in the conformations of amino acids are more severe in the graphene adsorbed state compared to the bulk state for all the force fields. Our findings suggest that while the thermodynamics of adsorption of proteins and peptides would be described consistently across different force fields, the structural preferences of peptides and proteins on graphene will be force field dependent.
Forward flux sampling (FFS) is an established scientific method for sampling rare events in molecular simulations. However, as the difficulty of the scientific problem increases, the amount of data and the number of tasks required for FFS is challenging to manage with traditional scripting tools and languages for high performance computing. The SAFFIRE software framework has been developed to address these challenges. SAFFIRE utilizes Hadoop to manage a large number of tasks and data for large scale FFS simulations. The framework is shown to be highly scalable and able to support large scale FFS simulations. This enables studies of rare events in complex molecular systems on commodity cluster computing systems.
The theory and simulation of systems that have realistic complexity and size and evolve across massive time scales are a critical challenge predicated upon the accurate description of many-body interactions. It builds upon the science of the small to create a new "Middle Science" whose research vision integrates modern math and data science with chemical theories as proposed in this In Focus article.
Many computational science applications utilize complex workflow patterns that generate an intricately connected set of output files for subsequent analysis. Some types of applications, such as rare event sampling, additionally require guaranteed completion of all subtasks for analysis, and place significant demands on the workflow management and execution environment. SciFlow is a user interface built over the Hadoop infrastructure that provides a framework to support the complex process and data interactions and guaranteed completion requirements of scientific workflows. It provides an efficient mechanism for building a parallel scientific application with dataflow patterns, and enables the design, deployment, and execution of data intensive, many-task computing tasks on a Hadoop platform. The design principles of this framework emphasize simplicity, scalability and fault-tolerance. A case study using the forward flux sampling rare event simulation application validates the functionality, reliability and effectiveness of the framework.
The melanocortin receptors are a class of centrally and peripherally expressed G protein-coupled receptors, of which the MC3R and MC4R subtypes are implicated in the regulation of appetite and energy homeostasis and can serve as potential therapeutic targets for disorders such as obesity and cachexia. An unbiased high-throughput mixture-based library screen was implemented to identify novel ligands with an emphasis on the identification of nanomolar-potent agonists of the mouse melanocortin-3 receptor. This screen yielded the discovery of an N-branched tricyclic guanidine scaffold (TPI2408) that contained three nanomolar potent mMC3R agonists and additional compounds that possessed antagonism for the mMC4R. The antagonist character of this scaffold library at the mMC4R was confirmed by a follow-up positional scanning antagonist screen. Additionally, molecular dynamics simulations herein provide mechanistic insight into the polypharmacological characteristics of melanocortin receptors. The disclosed materials have the potential to serve as important tools and SAR scaffolds in the study of melanocortin receptor function.
This project leverages advances in machine learning based data analysis techniques and untargeted omic analytical methods to progress nuclear nonproliferation technologies beyond current capabilities. The developed approaches can be used to identify and detect complex chemical fingerprints of facilities of interest. These techniques have been developed for fields such as metabolomics and genomics but have not been applied to nuclear nonproliferation applications. Adaptation of these techniques for volatile organic compound analysis has far reaching application within the scientific community including environmental chemistry, atmospheric physics, and climate sciences.
Cationic micelles, composed of amphiphilic block copolymers with polycationic coronas, offer a customizable platform for mRNA delivery. Here, we present a library of 30 cationic micelle nanoparticles (MNPs) formulated from diblock copolymers with reactive poly(pentafluorophenol acrylate) backbones modified with a diverse set of amines. This library systematically varies in nitrogen-based cationic functionalities, exhibiting a spectrum of properties that encompass varied degrees of alkyl substitution (A1-A5), piperazine (A6), oligoamine (A7), guanidinium (A8), hydroxylation (A9-A10) that vary in sidechain volume, substitution pattern, hydrophilicity, and pKa to assess parameter impact on mRNA delivery. In vitro delivery assays using GFP+ mRNA across multiple cell lines reveal that amine sidechain bulk and chemical structure critically affect performance. Using machine learning analysis via SHapley Additive exPlanations (SHAP) on 3,780 experimental data points, we mapped key relationships between amine chemistry and performance metrics, finding that amine-specific binding efficiency was a major determinant of mRNA delivery efficacy, cell viability, and GFP intensity. Micelles with stronger mRNA binding capabilities (A1 and A7) have higher cellular delivery performance, whereas those with intermediate binding tendencies deliver a higher amount of functional mRNA per cell (A2 and A10). This indicates that balancing the binding strength is crucial for performance. Micelles with hydrophobic and bulky pendant groups (A3, A4, and A5) tend to induce necrosis during cellular delivery, highlighting the significance of chemical optimization. A cationic amphiphile identified as A7 displaying a primary and secondary amine, consistently demonstrates the highest GFP expression across various cell types and in vivo achieving high delivery specificity to lung tissue upon intravenous administration. Moreover, we established a strong correlation between in vitro and in vivo performance using Multitask Gaussian Process models, linking amine properties directly to both delivery efficacy and biodistribution. This correlation underscores the predictive power of in vitro models for anticipating in vivo outcomes and highlights chemical amine-dependent optimization as crucial for advancing mRNA delivery vehicle development. Overall, this innovative study integrates advanced data science with experimental design demonstrating the pivotal role of chemical amine identity for targeted mRNA delivery to the lungs.
Heterogeneous nucleation is the dominant form of liquid-to-solid transition in nature. Although molecular simulations are most uniquely suited to studying nucleation, the waiting time to observe even a single nucleation event can easily exceed the current computational capabilities. Therefore, there exists an imminent need for methods that enable computationally fast and feasible studies of heterogeneous nucleation. Seeding is a technique that has proven to be successful at dramatically expanding the range of computationally accessible nucleation rates in simulation studies of homogeneous crystal nucleation. In this article, we introduce a new seeding method for heterogeneous nucleation called Rigid Seeding (RSeeds). Crystalline seeds are treated as pseudorigid bodies and simulated on a surface with metastable liquid above its melting temperature. This allows the seeds to adapt to the surface and identify favorable seed-surface configurations, which is necessary for reliable predictions of crystal polymorphs that form and the corresponding heterogeneous nucleation rates. We demonstrate and validate RSeeds for heterogeneous ice nucleation on a flexible self-assembled monolayer surface, a mineral surface based on kaolinite, and two model surfaces. RSeeds predicts the correct ice polymorph, exposed crystal plane, and rotation on the surface. RSeeds is semiquantitative and can be used to estimate the critical nucleus size and nucleation rate when combined with classical nucleation theory. We demonstrate that RSeeds can be used to evaluate nucleation rates spanning many orders of magnitude.
The mechanism of nucleation of clathrate hydrates of a water-soluble guest molecule is rigorously investigated with molecular dynamics (MD) simulations. Results from forward flux sampling, committor probability analysis, and twenty straightforward MD trajectories were combined to create a comprehensive understanding of the nucleation mechanism. Seven different classes of order parameters with a total of 33 individual variants were studied. We rank and evaluate the efficacy of prospective reaction coordinate models built from these order parameters and linear combinations thereof. Order parameters based upon water structuring provide a better approximation of the reaction coordinate than those based upon guest structuring. Our calculations suggest that the transition state is characterized by 2–3 partial, face-sharing 512 cages that form a structural motif observed in the structure II crystal. Further simulations show that once formed, this structure significantly affects the ordering of vicinal guest molecules, likely leading to hydrate nucleation. Our results contribute to the current understanding of the water–guest interplay involved in hydrate nucleation and have relevance to hydrate-based technologies that use water-soluble guest molecules (e.g., tetrahydrofuran) in mixed hydrate systems.