Data and Code for "Contrasting Historical and Physical Perspectives in Asymmetric Catalysis: ∆∆G‡ versus enantiomeric excess"
dc.contributor | Gensch, Tobias | |
dc.contributor | Schreiner, Peter Richard | |
dc.contributor.author | Ruth, Marcel | |
dc.contributor.other | Institute of Organic Chemistry, Justus Liebig University | de_DE |
dc.contributor.other | Institute of Chemistry, TU Berlin | de_DE |
dc.date.accessioned | 2023-10-17T09:04:52Z | |
dc.date.available | 2023-10-17T09:04:52Z | |
dc.date.issued | 2023-10-13 | |
dc.description.abstract | This repository contains all datasets that were used to evaluate the difference between ee and ΔΔG‡ modeling in enantioselective organocatalytic reactions. The scripts and notebooks used are also included to elucidate our modeling process. All descriptor and fingerprint-based models are included in "descriptorbased_parametric_models-repeated.ipynb". The evaluations and hyperparameter optimizations by our graph neural network are split into several small scripts and helper functions (basically all Python files). Article abstract: The modeling of catalytic, enantioselective reactions is pivotal for chiral drug development, green chemistry, and industrial applications. While ligand-based and quantitative structure-activity relationships have a long history, the limitations of these methods, including inadequate representation of reaction dynamics and physical constraints, have become increasingly evident. With the rise of machine learning due to increased computational power, the modeling of chemical systems has reached a new era and has the potential to revolutionize how we understand and predict reactions. Here we probe the historic dependence on utilizing enantiomeric excess (ee) as a target variable and discuss the benefits of using instead physically grounded differences Gibbs free activation energies (ΔΔG‡). We outline key benefits, such as enhanced modeling performance using ΔΔG‡, escaping physical limitations, addressing temperature effects, managing non-linear error propagation, adjusting for data distributions, and how to deal with unphysical predictions. For this endeavor, we gathered ten datasets from the literature covering very different reaction types, e.g., hydrogenation, Suzuki-, and Heck-reactions for 2761 data points. We evaluated fingerprint, descriptor, and graph neural network based models. Our results highlight the distinction in performance among varying model complexities and emphasize the importance of choosing suitable metrics for accurate and robust chemical modeling. | de_DE |
dc.description.sponsorship | German Research Foundation (DFG) | de_DE |
dc.identifier.uri | https://jlupub.ub.uni-giessen.de//handle/jlupub/18554 | |
dc.identifier.uri | http://dx.doi.org/10.22029/jlupub-17918 | |
dc.language.iso | en | de_DE |
dc.relation | http://dx.doi.org/10.22029/jlupub-18463 | |
dc.rights | CC0 1.0 Universal | * |
dc.rights.uri | http://creativecommons.org/publicdomain/zero/1.0/ | * |
dc.subject | Machine Learning | de_DE |
dc.subject | Organocatalysis | de_DE |
dc.subject | Literature Data | de_DE |
dc.subject | Chemistry | de_DE |
dc.subject.ddc | ddc:540 | de_DE |
dc.title | Data and Code for "Contrasting Historical and Physical Perspectives in Asymmetric Catalysis: ∆∆G‡ versus enantiomeric excess" | de_DE |
dc.type | Dataset | de_DE |
local.affiliation | FB 08 - Biologie und Chemie | de_DE |
local.project | SPP 2363, Schr 597/41-1 | de_DE |
Dateien
Originalbündel
1 - 4 von 4
Vorschaubild nicht verfügbar
- Name:
- descriptorbased_parametric_models-repeated.ipynb
- Größe:
- 576.95 KB
- Format:
- Unknown data format
- Beschreibung:
- Notebook used to evaluate the descriptor and fingerprint-based models
Vorschaubild nicht verfügbar
- Name:
- datasets_230520.zip
- Größe:
- 582.37 KB
- Format:
- Unknown data format
- Beschreibung:
- All datasets that we used for testing and modeling. The ZIP file contains each data set as a separate CSV file with "," as the delimiter. The data were extracted from literature data.
Vorschaubild nicht verfügbar
- Name:
- GNN.zip
- Größe:
- 11.41 KB
- Format:
- Unknown data format
- Beschreibung:
- This ZIP file contains all the Python files that were used to do the hyperparameter optimization, validation, and testing of the graph neural network based models. The file names are based on the function of the respective Python file.
Vorschaubild nicht verfügbar
- Name:
- README.md
- Größe:
- 2.36 KB
- Format:
- Unknown data format
- Beschreibung:
- Details about the datasets that are found in the datasets_230520.zip file. The references for each CSV file are also included in this MD file.
Lizenzbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
- Name:
- license.txt
- Größe:
- 7.58 KB
- Format:
- Item-specific license agreed upon to submission
- Beschreibung: