Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added option to impute missing values during training #336

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

megancoden
Copy link

@megancoden megancoden commented Apr 12, 2023

Issues

Change(s) made

  • In preprocess.R, we added an option called impute_in_preprocessing that defaults to True. When it is true, it calls the impute function, which performs the imputation on the data. When it is set to false, this step gets skipped.
  • We created a file called impute.R where we moved code from preprocess.R that would be shared between files. We created two functions: impute(), which takes in a dataset and performs the imputation, and prep_data(), which gets the dataset into the proper format to perform imputation.
  • In run_ml, we added an option called impute_in_training, which defaults to False. When it is true, it calls prep_data() and performs the imputation step following the train/test data split. When it is false, it does not modify the input dataset.
  • We added test cases to test-run_ml.R and in test-preprocess.R to ensure that our functionality worked as expected.

Checklist

(Strikethrough any points that are not applicable.)

  • Write unit tests for any new functionality or bug fixes.
  • Update docs if there are any API changes:
    • roxygen comments
    • vignettes
  • Update NEWS.md if this includes any user-facing changes.
  • The check workflow succeeds on your most recent commit. This is always required before the PR can be merged.

R/preprocess.R Outdated Show resolved Hide resolved
R/preprocess.R Outdated Show resolved Hide resolved
R/run_ml.R Outdated Show resolved Hide resolved
R/preprocess.R Outdated Show resolved Hide resolved
R/run_ml.R Outdated Show resolved Hide resolved
tests/testthat/test-run_ml.R Outdated Show resolved Hide resolved
@kelly-sovacool
Copy link
Member

These are API changes, so we'll need to document how to use them in the 'preprocess' vignette and roxygen comments.

R/impute.R Outdated Show resolved Hide resolved
@kelly-sovacool
Copy link
Member

kelly-sovacool commented Apr 13, 2023

Tip: You can run devtools::check() to see if your local version of the package passes the checks successfully, even before you commit and push your changes.

https://r-pkgs.org/whole-game.html#check

@megancoden
Copy link
Author

megancoden commented Apr 18, 2023

These are API changes, so we'll need to document how to use them in the 'preprocess' vignette and roxygen comments.

Would you like us to include information about how to use the option we added to the run_ml function in the 'preprocess' vignette, or should this be included elsewhere?

@kelly-sovacool
Copy link
Member

These are API changes, so we'll need to document how to use them in the 'preprocess' vignette and roxygen comments.

Would you like us to include information about how to use it in the run_ml function in the 'preprocess' vignette, or should this be included elsewhere?

I would discuss it in a new subheading the preprocess vignette, probably nested under the heading "Replace missing continuous data with the median value of that feature" (also maybe rephrase this heading to make it more succinct). Show how to call both preprocess_data and run_ml in one code chunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Impute missing values after the train/test split
3 participants