Data and Code for "Contrasting Historical and Physical Perspectives in Asymmetric Catalysis: ∆∆G‡ versus enantiomeric excess"

Ruth, Marcel

Data and Code for "Contrasting Historical and Physical Perspectives in Asymmetric Catalysis: ∆∆G‡ versus enantiomeric excess"

dc.contributor	Gensch, Tobias
dc.contributor	Schreiner, Peter Richard
dc.contributor.author	Ruth, Marcel
dc.contributor.other	Institute of Organic Chemistry, Justus Liebig University	de_DE
dc.contributor.other	Institute of Chemistry, TU Berlin	de_DE
dc.date.accessioned	2023-10-17T09:04:52Z
dc.date.available	2023-10-17T09:04:52Z
dc.date.issued	2023-10-13
dc.description.abstract	This repository contains all datasets that were used to evaluate the difference between ee and ΔΔG‡ modeling in enantioselective organocatalytic reactions. The scripts and notebooks used are also included to elucidate our modeling process. All descriptor and fingerprint-based models are included in "descriptorbased_parametric_models-repeated.ipynb". The evaluations and hyperparameter optimizations by our graph neural network are split into several small scripts and helper functions (basically all Python files). Article abstract: The modeling of catalytic, enantioselective reactions is pivotal for chiral drug development, green chemistry, and industrial applications. While ligand-based and quantitative structure-activity relationships have a long history, the limitations of these methods, including inadequate representation of reaction dynamics and physical constraints, have become increasingly evident. With the rise of machine learning due to increased computational power, the modeling of chemical systems has reached a new era and has the potential to revolutionize how we understand and predict reactions. Here we probe the historic dependence on utilizing enantiomeric excess (ee) as a target variable and discuss the benefits of using instead physically grounded differences Gibbs free activation energies (ΔΔG‡). We outline key benefits, such as enhanced modeling performance using ΔΔG‡, escaping physical limitations, addressing temperature effects, managing non-linear error propagation, adjusting for data distributions, and how to deal with unphysical predictions. For this endeavor, we gathered ten datasets from the literature covering very different reaction types, e.g., hydrogenation, Suzuki-, and Heck-reactions for 2761 data points. We evaluated fingerprint, descriptor, and graph neural network based models. Our results highlight the distinction in performance among varying model complexities and emphasize the importance of choosing suitable metrics for accurate and robust chemical modeling.	de_DE
dc.description.sponsorship	German Research Foundation (DFG)	de_DE
dc.identifier.uri	https://jlupub.ub.uni-giessen.de//handle/jlupub/18554
dc.identifier.uri	http://dx.doi.org/10.22029/jlupub-17918
dc.language.iso	en	de_DE
dc.relation	http://dx.doi.org/10.22029/jlupub-18463
dc.rights	CC0 1.0 Universal	*
dc.rights.uri	http://creativecommons.org/publicdomain/zero/1.0/	*
dc.subject	Machine Learning	de_DE
dc.subject	Organocatalysis	de_DE
dc.subject	Literature Data	de_DE
dc.subject	Chemistry	de_DE
dc.subject.ddc	ddc:540	de_DE
dc.title	Data and Code for "Contrasting Historical and Physical Perspectives in Asymmetric Catalysis: ∆∆G‡ versus enantiomeric excess"	de_DE
dc.type	Dataset	de_DE
local.affiliation	FB 08 - Biologie und Chemie	de_DE
local.project	SPP 2363, Schr 597/41-1	de_DE

Dateien

Originalbündel

Gerade angezeigt 1 - 4 von 4

Name:: descriptorbased_parametric_models-repeated.ipynb
Größe:: 576.95 KB
Format:: Unknown data format
Beschreibung:: Notebook used to evaluate the descriptor and fingerprint-based models

Herunterladen

Name:: datasets_230520.zip
Größe:: 582.37 KB
Format:: Unknown data format
Beschreibung:: All datasets that we used for testing and modeling. The ZIP file contains each data set as a separate CSV file with "," as the delimiter. The data were extracted from literature data.

Herunterladen

Name:: GNN.zip
Größe:: 11.41 KB
Format:: Unknown data format
Beschreibung:: This ZIP file contains all the Python files that were used to do the hyperparameter optimization, validation, and testing of the graph neural network based models. The file names are based on the function of the respective Python file.

Herunterladen

Name:: README.md
Größe:: 2.36 KB
Format:: Unknown data format
Beschreibung:: Details about the datasets that are found in the datasets_230520.zip file. The references for each CSV file are also included in this MD file.

Herunterladen

Lizenzbündel

Gerade angezeigt 1 - 1 von 1

Name:: license.txt
Größe:: 7.58 KB
Format:: Item-specific license agreed upon to submission
Beschreibung:

Herunterladen

Sammlungen

Forschungsdaten