Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide test datasets where labels are hidden #1188

Open
ArturDev42 opened this issue Apr 20, 2023 · 2 comments
Open

Provide test datasets where labels are hidden #1188

ArturDev42 opened this issue Apr 20, 2023 · 2 comments

Comments

@ArturDev42
Copy link

ArturDev42 commented Apr 20, 2023

Description

My understanding is that when creating a task, the selected estimation procedure defines the train/test splits of the dataset. When creating a run, a model is evaluated on the given dataset using some estimation procedure such as cross-validation.

Is it possible to provde hidden test sets for evaluation of solutions? There is a "Note" about that on https://openml.github.io/OpenML/#circles-under-construction in the section about Tasks but I'm not sure if this version of the docs is up-to-date.

For my use case, this issue is connected to openml/openml-python#1231. I would like the user to be able to upload the predictions for a test set where the labels are hidden, so the user has no access to it. Is this possible by default? Or would I need to create a separate task with a separate dataset that only contains X, but no labels? What would be the recommende approach here?

Thanks!

@PGijsbers
Copy link
Contributor

PGijsbers commented Apr 24, 2023

There is currently no way to hide any data from users. The only way to make sure users have absolutely no access to the test labels, is by uploading not uploading it. This means uploading two datasets: one with training data and labels, and one with test data without labels. This will allow the users to easily access these datasets, but still won't allow for uploading predictions of the hidden test set. This is not supported. You would need some alternate solution for submitting and processing these predictions. As far as I am aware, there are currently no plans to support this competition-style hidden data.

Circles are for sharing entire datasets, but only with a set of trusted users. This is currently on our roadmap, but I would not expect if before Fall this year (this is not a promise).

If you want, you can create an OpenML feature request on our OpenML discussion board. Be elaborate, specify use cases, target audience, give examples of it being used, and an example of how you think this would work with OpenML from a user-perspective (in this case, both the competition organizer and the competition participant). This makes it easier for us to discuss the idea and determine if it's something we want to support.

@mfeurer mfeurer transferred this issue from openml/openml-python Apr 25, 2023
@mfeurer
Copy link
Contributor

mfeurer commented Apr 25, 2023

Hey, I transferred this issue to the main OpenML issue tracker as this is not an issue of the PYthon API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants