You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I Have dataset probably around 700Gb. When I want to solve a problem, I select a subsample from them and use it as a train/test data. And when I feed only txt with paths (data_path) of subsample.
So, when I use clearml I have to initalize dataset = Dataset()) and after call dataset.sync_folder(). But if I use it this way, then clearml will chunk my data and load it into filestorage, so I end up with duplicates in the data.
I don’t want clearml to duplicate the data, but I just want it to monitor the shared folder with all the data and show only the paths for the selected ones.
How can I solve this problem?
The text was updated successfully, but these errors were encountered:
@nikiniki1Dataset.sync_folder is intended to do exactly that: synchronize data between two locations.
If your use case uses a single location, I think Dataset.add_external_files is what you need.
Hi!
I'm going to use clearml data like this:
How can I solve this problem?
The text was updated successfully, but these errors were encountered: