Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated estimation of number resamplings given the size of the train data #231

Open
ivan-marroquin opened this issue Nov 2, 2022 · 1 comment
Labels
contributors Proposed by contributors. enhancement New feature or request good first issue Good for newcomers

Comments

@ivan-marroquin
Copy link

Is your feature request related to a problem? Please describe.
When defining the "cv" splitter using the Subsample class, it is required to provide the "n_resamplings" and "n_samples". If the "n_resamplings" is not properly selected, the following warning message is raised:

"WARNING: at least one point of training set belongs to every resamplings. Increase the number of resamplings"

Describe the solution you'd like
I think it will be beneficial if there is an automated way to estimate "n_resamplings" given the "n_samples". For instance, a user would choose to fix the "n_samples" in the following manner: n_samples= int(0.25 * gral_train_inputs.shape[0])

Then, the "n_resamplings" is determined accordingly to the size of the training data.

Describe alternatives you've considered
In my case, I decided to fix the "n_samples" as shown above. But now, I have to do trail/error to find the minimum "n_resamplings" to avoid the warning message to ensure good statistical results.

Kind regards,
Ivan

@ivan-marroquin ivan-marroquin added the enhancement New feature or request label Nov 2, 2022
@vincentblot28 vincentblot28 added this to Need triage in Developments via automation Mar 2, 2023
@vincentblot28 vincentblot28 moved this from Need triage to Bugs to be fixed (Low Priority) in Developments Mar 2, 2023
@vincentblot28 vincentblot28 added the good first issue Good for newcomers label Mar 2, 2023
@vincentblot28
Copy link
Collaborator

Hi @ivan-marroquin, thank you for your issue. Indeed, automatically estimated the number of resampling is a good idea. Do you have a theoretical formula to do it ? If so feel free to share it with us, or even implement it into MAPIE a do a Pull Request

@thibaultcordier thibaultcordier added the contributors Proposed by contributors. label Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributors Proposed by contributors. enhancement New feature or request good first issue Good for newcomers
Projects
Developments
Bugs to be fixed (Low Priority)
Development

No branches or pull requests

3 participants