New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add args for prediction type #2

Open

dswigh opened this issue Mar 11, 2023 · 2 comments

Labels

Collaborator

dswigh commented Mar 11, 2023 •

edited

Add 2 new args:

prediction_type (or something like that): e.g. yield prediction, only_mapped_reaction, condition_prediction

If user only wants the mapped reaction strings, we should by-pass the sanity-checks for the reaction conditions, ultimately resulting in a larger dataset to work with. Likewise for yield prediction (we remove reactions without yields) etc.

Data_set: only_uspto, all_available

For benchmarking purposes, it would be great to have an option that always generates the same dataset (e.g. only USPTO data), and another option that just includes all data currently stored in USPTO

dswigh added the enhancement label

dswigh changed the title ~~Extract only mapped reactions~~ Add args for prediction type

Collaborator Author

dswigh commented Mar 30, 2023

Instead of having a 'prediction type', let's create two flat file benchmarks, both just extracting USPTO data, but one with default settings that removes/handles reactions with uncommon molecules, and another with all the arg settings set to 0.
This has been implemented!

Collaborator Author

dswigh commented Apr 13, 2023

When creating flat files for benchmarking, we should creat train/val/test splits (80/10/10), splitting the data in 3 different ways: random, temporal (by grant date), and rxn class (both by super class (very hard) and by sub-classes (medium difficulty)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment