Data and Code for "Contrasting Historical and Physical Perspectives in Asymmetric Catalysis: ∆∆G‡ versus enantiomeric excess"

dc.contributorGensch, Tobias
dc.contributorSchreiner, Peter Richard
dc.contributor.authorRuth, Marcel
dc.contributor.otherInstitute of Organic Chemistry, Justus Liebig Universityde_DE
dc.contributor.otherInstitute of Chemistry, TU Berlinde_DE
dc.date.accessioned2023-10-17T09:04:52Z
dc.date.available2023-10-17T09:04:52Z
dc.date.issued2023-10-13
dc.description.abstractThis repository contains all datasets that were used to evaluate the difference between ee and ΔΔG‡ modeling in enantioselective organocatalytic reactions. The scripts and notebooks used are also included to elucidate our modeling process. All descriptor and fingerprint-based models are included in "descriptorbased_parametric_models-repeated.ipynb". The evaluations and hyperparameter optimizations by our graph neural network are split into several small scripts and helper functions (basically all Python files). Article abstract: The modeling of catalytic, enantioselective reactions is pivotal for chiral drug development, green chemistry, and industrial applications. While ligand-based and quantitative structure-activity relationships have a long history, the limitations of these methods, including inadequate representation of reaction dynamics and physical constraints, have become increasingly evident. With the rise of machine learning due to increased computational power, the modeling of chemical systems has reached a new era and has the potential to revolutionize how we understand and predict reactions. Here we probe the historic dependence on utilizing enantiomeric excess (ee) as a target variable and discuss the benefits of using instead physically grounded differences Gibbs free activation energies (ΔΔG‡). We outline key benefits, such as enhanced modeling performance using ΔΔG‡, escaping physical limitations, addressing temperature effects, managing non-linear error propagation, adjusting for data distributions, and how to deal with unphysical predictions. For this endeavor, we gathered ten datasets from the literature covering very different reaction types, e.g., hydrogenation, Suzuki-, and Heck-reactions for 2761 data points. We evaluated fingerprint, descriptor, and graph neural network based models. Our results highlight the distinction in performance among varying model complexities and emphasize the importance of choosing suitable metrics for accurate and robust chemical modeling.de_DE
dc.description.sponsorshipGerman Research Foundation (DFG)de_DE
dc.identifier.urihttps://jlupub.ub.uni-giessen.de//handle/jlupub/18554
dc.identifier.urihttp://dx.doi.org/10.22029/jlupub-17918
dc.language.isoende_DE
dc.relationhttp://dx.doi.org/10.22029/jlupub-18463
dc.rightsCC0 1.0 Universal*
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/*
dc.subjectMachine Learningde_DE
dc.subjectOrganocatalysisde_DE
dc.subjectLiterature Datade_DE
dc.subjectChemistryde_DE
dc.subject.ddcddc:540de_DE
dc.titleData and Code for "Contrasting Historical and Physical Perspectives in Asymmetric Catalysis: ∆∆G‡ versus enantiomeric excess"de_DE
dc.typeDatasetde_DE
local.affiliationFB 08 - Biologie und Chemiede_DE
local.projectSPP 2363, Schr 597/41-1de_DE

Dateien

Originalbündel
Gerade angezeigt 1 - 4 von 4
Vorschaubild nicht verfügbar
Name:
descriptorbased_parametric_models-repeated.ipynb
Größe:
576.95 KB
Format:
Unknown data format
Beschreibung:
Notebook used to evaluate the descriptor and fingerprint-based models
Vorschaubild nicht verfügbar
Name:
datasets_230520.zip
Größe:
582.37 KB
Format:
Unknown data format
Beschreibung:
All datasets that we used for testing and modeling. The ZIP file contains each data set as a separate CSV file with "," as the delimiter. The data were extracted from literature data.
Vorschaubild nicht verfügbar
Name:
GNN.zip
Größe:
11.41 KB
Format:
Unknown data format
Beschreibung:
This ZIP file contains all the Python files that were used to do the hyperparameter optimization, validation, and testing of the graph neural network based models. The file names are based on the function of the respective Python file.
Vorschaubild nicht verfügbar
Name:
README.md
Größe:
2.36 KB
Format:
Unknown data format
Beschreibung:
Details about the datasets that are found in the datasets_230520.zip file. The references for each CSV file are also included in this MD file.
Lizenzbündel
Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
license.txt
Größe:
7.58 KB
Format:
Item-specific license agreed upon to submission
Beschreibung:

Sammlungen