Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local data sync into clearml-data #1246

Open
nikiniki1 opened this issue Apr 15, 2024 · 1 comment
Open

Local data sync into clearml-data #1246

nikiniki1 opened this issue Apr 15, 2024 · 1 comment

Comments

@nikiniki1
Copy link

Hi!
I'm going to use clearml data like this:

  1. I Have dataset probably around 700Gb. When I want to solve a problem, I select a subsample from them and use it as a train/test data. And when I feed only txt with paths (data_path) of subsample.
  2. So, when I use clearml I have to initalize dataset = Dataset()) and after call dataset.sync_folder(). But if I use it this way, then clearml will chunk my data and load it into filestorage, so I end up with duplicates in the data.
  3. I don’t want clearml to duplicate the data, but I just want it to monitor the shared folder with all the data and show only the paths for the selected ones.
    How can I solve this problem?
@ainoam
Copy link
Collaborator

ainoam commented Apr 15, 2024

@nikiniki1 Dataset.sync_folder is intended to do exactly that: synchronize data between two locations.
If your use case uses a single location, I think Dataset.add_external_files is what you need.

Does this help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants