Data Mining the Cambridge Structural Database for Hydrate-Anhydrate Pairs with SMILES Strings

2020 
Many organic molecules can crystallize in either hydrated or anhydrous forms. Predicting the formation of hydrates and their relative stability with respect to water-free alternative phases are significant challenges. Here we use the Cambridge Structural Database (CSD) and data informatics to identify and analyze hydrate-anhydrate structure pairs. A search method was developed based on Simplified Molecular-Input Line-Entry strings (SMILES) matching and implemented through the CSD Python Application Programming Interface (API). Of the >23,000 molecular hydrates containing no metal ions, ~1,400 were found have at least one corresponding anhydrous form, yielding just over 2,000 unique pairs in the CSD. Hydrates with and without a reported anhydrate showed a similar distribution in their water stoichiometries. Lattice symmetry and packing fraction comparisons are reported for the paired hydrates and anhydrates. Structure pairs with one organic component and multiple organic components showed some subtle differences. The details and limitations of the method are in a way that can encourage and guide other types of CSD searches using SMILES.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    9
    Citations
    NaN
    KQI
    []