Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to directly sample the disruptive subset during shot list/set splitting #44

Open
1 task
felker opened this issue Dec 5, 2019 · 0 comments
Open
1 task

Comments

@felker
Copy link
Member

felker commented Dec 5, 2019

Currently, if the testing and training ({train} U {validate}) are drawn from the same source shot list, then the ratio conf['model']['train_frac'] is used to randomly divide the source shots without regards to the shot classes. This also occurs for the splitting of the train and validate sets with conf['model']['validation_frac'].

So, while the the division of the overall shot counts will exactly match the desired fractions within 1/N (where N is the total number of shots), the division of the non-/ disruptive shots among the sets may not be so close to that fraction. This is only a problem when the number of disruptive (or nondisruptive) samples is low and/or the training and testing sets are drawn from different raw lists. As the number of samples -> infinity, of course the N_{validate, disrupt}/N_{training, disrupt} -> conf['model']['validation_frac'], e.g.

There is no real reason not to explicitly divide the disruptive and non-disruptive classes when performing the splitting of the shot sets, so I think we should at least add it as an option, if not make it the default behavior

  • Consider renaming train_frac to test_frac (value = 1.0 - train_frac) or another name to make it clear that the "training fraction" is further divided between the training and hold-out validation sets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant