Our research in the field of predicting whether a particular reaction will work.

At the events in York and Cambridge, we had an amazing opportunity to present our research for the first time! We evaluated a few existing and novel machine learning models for determining reaction feasibility. We find that our best model generalizes surprisingly well on the task and performs similarly to a heuristic constructed by an expert for a specific reaction type.

To train and evaluate our models, we used 400 000 reactions scraped from publicly available US patents (USPTO) as "true" reactions. We found about 1600 commonly occurring reaction templates in the dataset. We generated negative samples for each reaction by applying its template to all other existing matching places in substrates. The models were then evaluated by how well they discriminate between the two classes of reactions using the ROC Area Under Curve (AUC) metric. We split the data into training and test sets by clustering reactions using chemical fingerprints of their substrates.

positive_negative Negative sample generation process. We apply the reaction template to all matching places in the substrate and consider negative reactions occurring at other locations than the original one. These reactions are implausible because another matching place was preferred in the real reaction from the dataset.

We tested a few neural network architectures: a model working on reaction fingerprint, a Convolutional Neural Network and a few Graph Convolutional Networks. In particular, we tested the Edge Attention Graph Neural Network (EAGCN), which we adapted to chemical reactions. We also enhanced it with a multi-headed self-attention mechanism. Our best model, EAGCN with two attention heads reached an AUC score of 0.99 compared to 0.95 for the second-best model - EAGCN with a single attention head. See the poster for more details on our improvements over EAGCN.

To better understand how well our models generalize, we compared them to a heuristic developed by a chemist specifically for a popular reaction type (aromatic nitration). Our best model achieves a comparable performance to the expert heuristic (F1 score of 0.81, compared to 0.82 achieved by the heuristic), despite the fact that we trained it on a whole dataset of reactions, while the heuristic was crafted specifically for this reaction type. This demonstrates the potential of machine learning for the task. The expert heuristic was both very time-consuming to construct and limited in application to a narrow portion of the dataset. Learned models do not have these limitations.

our_model_is_the_best.png Accuracy of our models measured on the whole dataset (blue), and the nitration reaction (green). The enhanced EAGCN model achieves a performance comparable to the human crafted heuristic on the diagnostic nitration reactions.

We also evaluated our models on unseen types of reactions. We observe that in some cases we are able to achieve significant performance despite not seeing these reaction types before, indicating the ability to generalize to previously unknown chemistry.

Diagnostic_reaction An example from the aromatic nitration reaction type on which we achieve comparable results to a human crafted heuristic.

Our results show that, in comparison to the other models, Graph Neural Networks are able to generalize far better on the task. We have applied the best of the described models to aid in retrosynthesis planning. We also plan to release our datasets as a benchmark for the field, as well as code to train the models.

For more information please take a look at our poster!




Mikołaj Sacha