When SMILES smiles, Practicality Judgment and Yield Prediction of Chemical Reaction via Deep Chemical Language Processing

2021 
Simplified Molecular Input Line Entry System (SMILES) provides a text-based encoding method to describe the structure of chemical species and formulize general chemical reactions. Considering that chemical reactions have been represented in a language form, we present a symbol only model to generally predict the yield of organic synthesis reaction without considering complex quantum physical modeling or chemistry knowledge. Our model is the first deep neural network application that treats chemical reaction text segments as embedding representation to the most recent deep natural language processing. Experimental results show our model can effectively predict chemical reactions, which achieves a high accuracy of 99.76% on practicality judgment and the Root Mean Square Error (RMSE) is around 0.2 for yield prediction. Our work shows the great potential for automatic yield prediction for organic reactions under general conditions and further applications in synthesis path prediction with the least modeling cost.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []