Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop caching data in a relative directory #196

Open
davidegraff opened this issue Feb 16, 2023 · 1 comment
Open

stop caching data in a relative directory #196

davidegraff opened this issue Feb 16, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@davidegraff
Copy link

Describe the problem
TDC caches downloaded data to disk for future uses, but by default, it caches this data to a relative local directory ./data. If I then use TDC from a different directory on the same machine without specifying the previous location, it downloads the data again, unnecessarily polluting disk space.

Describe the solution you'd like
Use a "global" cache directory that is absolute for a user. It's standard practice for most applications to cache downloaded data to a hidden directory like $HOME/.cache/PACKAGE (c.f., wandb, pip, huggingface, black, etc.) by default. At runtime, a user can change this if desired and configure this default location using an environment variable (see: huggingface)

I currently have this manually implemented in my TDC client code like so:

import os
from pathlib import Path
from tdc.single_pred import ADME

TDC_CACHE = os.getenv("TDC_DATASETS_CACHE", Path.home() / ".cache" / "TDC")
data = ADME(name = 'Caco2_Wang', path=TDC_CACHE)

but this is cumbersome to do everywhere. It would be nice for TDC to do this by default.

You can do this by changing the path parameter type from str to Optional[str] with a default value of None. A value of None indicates to use TDC_DATASETS_CACHE from the environment, allowing a user to (1) globally configure the default location of TDC downloads from the environment, and (2) avoid redownloading datasets every time they change directories.

@kexinhuang12345
Copy link
Collaborator

kexinhuang12345 commented Feb 24, 2023

That's a great point! Will be working on it! Let us know if you would like to make a PR for it, thanks!!

@kexinhuang12345 kexinhuang12345 self-assigned this Feb 24, 2023
@kexinhuang12345 kexinhuang12345 added the enhancement New feature or request label Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants