Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache run predictions #1191

Open
PGijsbers opened this issue Dec 5, 2022 · 2 comments
Open

Cache run predictions #1191

PGijsbers opened this issue Dec 5, 2022 · 2 comments
Labels
enhancement Run OpenML concept

Comments

@PGijsbers
Copy link
Collaborator

Problem

Predictions of runs are not cached when downloaded. Note that predictions only get downloaded when get_metric_fn is called in the first place (this is desired behavior, the description file already contains precomputed evaluations).

MWE

CLI: ls ~/.openml/org/openml/www/runs/10591753/
Output: ls: /Users/pietergijsbers/.openml/org/openml/www/runs/10591753/: No such file or directory

Execute:

import openml
import logging
from sklearn.metrics import accuracy_score

logging.basicConfig(level=logging.DEBUG)
run = openml.runs.get_run(10591753)
run.get_metric_fn(accuracy_score)

output:

>>> run = openml.runs.get_run(10591753)
INFO:root:Starting [get] request for the URL https://www.openml.org/api/v1/xml/run/10591753
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.openml.org:443
DEBUG:urllib3.connectionpool:https://www.openml.org:443 "GET /api/v1/xml/run/10591753 HTTP/1.1" 307 336
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.openml.org:443
DEBUG:urllib3.connectionpool:https://api.openml.org:443 "GET /api/v1/xml/run/10591753 HTTP/1.1" 200 5112
INFO:root:0.1340468s taken for [get] request for the URL https://www.openml.org/api/v1/xml/run/10591753

>>> run.get_metric_fn(accuracy_score)
INFO:root:Starting [get] request for the URL https://www.openml.org/data/download/22111640/predictions.arff
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.openml.org:443
DEBUG:urllib3.connectionpool:https://www.openml.org:443 "GET /data/download/22111640/predictions.arff HTTP/1.1" 307 352
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.openml.org:443
DEBUG:urllib3.connectionpool:https://api.openml.org:443 "GET /data/download/22111640/predictions.arff HTTP/1.1" 200 None
INFO:root:0.0710640s taken for [get] request for the URL https://www.openml.org/data/download/22111640/predictions.arff

array([0.76623377, 0.5974026 , 0.72727273, 0.68831169, 0.7012987 ,
       0.75324675, 0.77922078, 0.77922078, 0.73684211, 0.67105263])

CLI: ls ~/.openml/org/openml/www/runs/10591753/
Output: description.xml

Note that there are no signs of the prediction arff file being present on disk - as you would expect from reading the source code.

@mfeurer mfeurer added the Run OpenML concept label Feb 20, 2023
@mfeurer
Copy link
Collaborator

mfeurer commented Jun 13, 2023

After an offline discussion with @PGijsbers we agreed that this should be an optional feature, i.e. that caching is disabled by default, but can be enabled.

@PGijsbers
Copy link
Collaborator Author

There are definitely cases where this is useful (experimenting with evaluation metrics or ensembling), but the average user that probably doesn't load the same runs many times. Because it would quickly occupy a lot of disk space, we think opt-in is better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Run OpenML concept
Projects
None yet
Development

No branches or pull requests

2 participants