Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make permutation importance test optional, functionality to use any outcome variables, optional hyperparameter input #11

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

zenalapp
Copy link

@zenalapp zenalapp commented Nov 11, 2019

Fixes #6
Fixes #7
Fixes #8
Fixes #10

@zenalapp zenalapp changed the title make permutation importance test optional & functionality to use any outcome variables make permutation importance test optional, functionality to use any outcome variables, optional hyperparameter input Nov 11, 2019
@BTopcuoglu
Copy link
Collaborator

BTopcuoglu commented Nov 11, 2019

get_results(dataset, models, split_number, outcome="dx") errors out when outcome is defined by user.
pipeline(dataset, models, split_number, outcome="dx") works. So it must be an issue when the argument is being passed from get_results to pipeline function.

@BTopcuoglu
Copy link
Collaborator

BTopcuoglu commented Nov 12, 2019

Define outcome variable and permutation logical as arguments that can be passed in the command line. We can now run from command line with:

Rscript code/learning/main.R 1 "L2_Logistic_Regression" "dx" 0

@BTopcuoglu
Copy link
Collaborator

BTopcuoglu commented Nov 12, 2019

Previous changes regarding setting outcome and perm on the command line work when the user defines the outcome (e.g. "dx") as an argument. However, if they leave that argument empty (passed as NA e.g. Rscript code/learning/main.R 1 "L2_Logistic_Regression" 0), the rest of the pipeline breaks (the first column does not get selected).

  1. I changed the order of the arguments in get_aucs function and then instead of NULL, I uses NA:

get_results <- function(dataset, models, split_number, perm=T, outcome=NA, hyperparameters=NULL)

  1. Changed pipeline function to have NA as well.

pipeline <- function(dataset, model, split_number, outcome=NA, hyperparameters=NULL, perm=T)

  1. Edited outcome=NULL argument from tuning_grid function because outcome infor should be decided in the previous functions already:

tuning_grid <- function(train_data, model, outcome, hyperparameters=NULL)

@BTopcuoglu
Copy link
Collaborator

BTopcuoglu commented Nov 12, 2019

With these changes, it looks like we Fixed #6 and #7. Next step is checking if permutation works as we want in #8.

@BTopcuoglu
Copy link
Collaborator

BTopcuoglu commented Nov 12, 2019

permutation_importance function doesn't work.
Error:

Error in -sym(first_outcome) : invalid argument to unary operator
Calls: get_results ... <Anonymous> -> vars_select_eval -> map_if -> map -> .f

Caught the bug in line 87 and changed from first_outcome to outcome:

  non_correlated_otus <- full %>%
    select(-correlated_otus) %>%
    select(-sym(outcome)) %>%
    colnames()

@zenalapp
Copy link
Author

Nooo I was worried this wouldn't work but I wasn't doing permutation importance so I didn't catch it. I can look into another option if you don't know of one

@BTopcuoglu
Copy link
Collaborator

Nooo I was worried this wouldn't work but I wasn't doing permutation importance so I didn't catch it. I can look into another option if you don't know of one

I'm running it now with the change I've made (it was passing first_outcome which is not a column that can be selected but now uses sym("dx") which should work theoretically, I'll keep you posted.

@zenalapp
Copy link
Author

But don't we want to have dx not hard-coded?

@BTopcuoglu
Copy link
Collaborator

BTopcuoglu commented Nov 12, 2019

But don't we want to have dx not hard-coded?

No I know, I have it as:

  non_correlated_otus <- full %>%
    select(-correlated_otus) %>%
    select(-sym(outcome)) %>%
    colnames()

Instead of what it was before:

non_correlated_otus <- full %>%
    select(-correlated_otus) %>%
    select(-sym(first_outcome)) %>%
    colnames()

@BTopcuoglu
Copy link
Collaborator

BTopcuoglu commented Nov 12, 2019

Our previous attempt was unsuccessful - must be a bug with tidyverse. I made a new change:

  non_correlated_otus <- full %>%
    select(-correlated_otus)
  
  non_correlated_otus[,outcome] <- NULL
  
  non_correlated_otus <- non_correlated_otus %>%
    colnames()

Not the most beautiful code snippet I've written but it'll do:)

@BTopcuoglu
Copy link
Collaborator

BTopcuoglu commented Nov 13, 2019

#8 We now made permutation importance optional but the data structure to run permutation is still hardcoded. We need to come back to that and fix it.

I'll now check if user-defined hyperparameters work #10 .

@BTopcuoglu
Copy link
Collaborator

The NULL options for hyperparameters are currently specific to my CRC classification problem (except random forest, where we implement Pat's code: mtry <- floor(seq(1, n_features, length=6)) ). So those need to be adjusted and expanded in the future, but overall, I'm able to set up user-defined hyperparameters as a list and it works. Fixed #10.

@BTopcuoglu BTopcuoglu added enhancement New feature or request good first issue Good for newcomers labels Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
2 participants