Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add args for prediction type #2

Open
dswigh opened this issue Mar 11, 2023 · 2 comments
Open

Add args for prediction type #2

dswigh opened this issue Mar 11, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@dswigh
Copy link
Collaborator

dswigh commented Mar 11, 2023

Add 2 new args:

  1. prediction_type (or something like that): e.g. yield prediction, only_mapped_reaction, condition_prediction
  • If user only wants the mapped reaction strings, we should by-pass the sanity-checks for the reaction conditions, ultimately resulting in a larger dataset to work with. Likewise for yield prediction (we remove reactions without yields) etc.
  1. Data_set: only_uspto, all_available
  • For benchmarking purposes, it would be great to have an option that always generates the same dataset (e.g. only USPTO data), and another option that just includes all data currently stored in USPTO
@dswigh dswigh added the enhancement New feature or request label Mar 11, 2023
@dswigh dswigh changed the title Extract only mapped reactions Add args for prediction type Mar 16, 2023
@dswigh
Copy link
Collaborator Author

dswigh commented Mar 30, 2023

  1. Instead of having a 'prediction type', let's create two flat file benchmarks, both just extracting USPTO data, but one with default settings that removes/handles reactions with uncommon molecules, and another with all the arg settings set to 0.
  2. This has been implemented!

@dswigh
Copy link
Collaborator Author

dswigh commented Apr 13, 2023

  1. When creating flat files for benchmarking, we should creat train/val/test splits (80/10/10), splitting the data in 3 different ways: random, temporal (by grant date), and rxn class (both by super class (very hard) and by sub-classes (medium difficulty)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant