Data and Code for "Contrasting Historical and Physical Perspectives in Asymmetric Catalysis: ∆∆G‡ versus enantiomeric excess"


This repository contains all datasets that were used to evaluate the difference between ee and ΔΔG‡ modeling in enantioselective organocatalytic reactions.

The scripts and notebooks used are also included to elucidate our modeling process. All descriptor and fingerprint-based models are included in "descriptorbased_parametric_models-repeated.ipynb". The evaluations and hyperparameter optimizations by our graph neural network are split into several small scripts and helper functions (basically all Python files).

Article abstract: The modeling of catalytic, enantioselective reactions is pivotal for chiral drug development, green chemistry, and industrial applications. While ligand-based and quantitative structure-activity relationships have a long history, the limitations of these methods, including inadequate representation of reaction dynamics and physical constraints, have become increasingly evident. With the rise of machine learning due to increased computational power, the modeling of chemical systems has reached a new era and has the potential to revolutionize how we understand and predict reactions. Here we probe the historic dependence on utilizing enantiomeric excess (ee) as a target variable and discuss the benefits of using instead physically grounded differences Gibbs free activation energies (ΔΔG‡). We outline key benefits, such as enhanced modeling performance using ΔΔG‡, escaping physical limitations, addressing temperature effects, managing non-linear error propagation, adjusting for data distributions, and how to deal with unphysical predictions. For this endeavor, we gathered ten datasets from the literature covering very different reaction types, e.g., hydrogenation, Suzuki-, and Heck-reactions for 2761 data points. We evaluated fingerprint, descriptor, and graph neural network based models. Our results highlight the distinction in performance among varying model complexities and emphasize the importance of choosing suitable metrics for accurate and robust chemical modeling.




Erstpublikation in


URI der Erstpublikation



Erstpublikation in