Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on hyperparameter distributions and validation percentage #174

Open
ClimbsRocks opened this issue Mar 29, 2016 · 0 comments
Open

Comments

@ClimbsRocks
Copy link
Owner

Question from @MelvinDunn that I'm documenting here:

Had one question while I was looking at the ol' machina:

-How does this machine determine the starting points for hyperparams?
-How does it determine the validation size? (Couldn't find it)

Sorry, I was interested, and while I know I could easily just look at the
code myself, I thought you would know off the top of your head.

I'm extremely interested in AutoML, and I think this machine is, well,
wonderful.

Thanks again,

Melvin

My response:
I love curiosity- thanks for continuing to ask questions!

  1. We use RandomizedSearchCV to find the optimal hyperparameters. It picks parameters randomly from the distributions we give. Those distributions can be found in pySetup/parameterMakers.
  2. Right now the validation size is just hard-coded in. It's a pretty large split. I've messed around with different values, but I want to say it's somewhere around 20-40% depending on the size of the input data. The exception to this is data like Numer.ai that has a specific validationSplit column, that must be specified in the dataDescription row (where we specify what type of data each column holds). Then we just use that validation split.

Keep the questions coming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant