-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flwr_datasets with custom/local dataset #3201
Comments
It seems that flower/datasets/flwr_datasets/federated_dataset.py Lines 43 to 44 in f78ef0a
flower/datasets/flwr_datasets/federated_dataset.py Lines 237 to 239 in f78ef0a
Evidenced by: fds.load_partition(1, split="train") And the following error: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[137], line 1
----> 1 fds.load_partition(1, split="train")
File ~...\Lib\site-packages\flwr_datasets\federated_dataset.py:131, in FederatedDataset.load_partition(self, partition_id, split)
108 """Load the partition specified by the idx in the selected split.
109
110 The dataset is downloaded only when the first call to `load_partition` or
(...)
128 Single partition from the dataset split.
129 """
130 if not self._dataset_prepared:
--> 131 self._prepare_dataset()
132 if self._dataset is None:
133 raise ValueError("Dataset is not loaded yet.")
File ~...\Lib\site-packages\flwr_datasets\federated_dataset.py:237, in FederatedDataset._prepare_dataset(self)
216 def _prepare_dataset(self) -> None:
217 """Prepare the dataset (prior to partitioning) by download, shuffle, replit.
218
219 Run only ONCE when triggered by load_* function. (In future more control whether
(...)
235 happen before the resplitting.
236 """
--> 237 self._dataset = datasets.load_dataset(
238 path=self._dataset_name, name=self._subset
239 )
240 if self._shuffle:
241 # Note it shuffles all the splits. The self._dataset is DatasetDict
242 # so e.g. {"train": train_data, "test": test_data}. All splits get shuffled.
243 self._dataset = self._dataset.shuffle(seed=self._seed)
File ~...\Lib\site-packages\datasets\load.py:2538, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, trust_remote_code, **config_kwargs)
2536 if data_files is not None and not data_files:
2537 raise ValueError(f"Empty 'data_files': '{data_files}'. It should be either non-empty or None (default).")
-> 2538 if Path(path, config.DATASET_STATE_JSON_FILENAME).exists():
2539 raise ValueError(
2540 "You are trying to load a dataset that was saved using `save_to_disk`. "
2541 "Please use `load_from_disk` instead."
2542 )
2544 if streaming and num_proc is not None:
File ~...\Lib\pathlib.py:1162, in Path.__init__(self, *args, **kwargs)
1159 msg = ("support for supplying keyword arguments to pathlib.PurePath "
1160 "is deprecated and scheduled for removal in Python {remove}")
1161 warnings._deprecated("pathlib.PurePath(**kwargs)", msg, remove=(3, 14))
-> 1162 super().__init__(*args)
File ~...\Lib\pathlib.py:373, in PurePath.__init__(self, *args)
371 path = arg
372 if not isinstance(path, str):
--> 373 raise TypeError(
374 "argument should be a str or an os.PathLike "
375 "object where __fspath__ returns a str, "
376 f"not {type(path).__name__!r}")
377 paths.append(path)
378 self._raw_paths = paths
TypeError: argument should be a str or an os.PathLike object where __fspath__ returns a str, not 'Dataset' |
Describe the bug
FDS raising warning for custom datasets:
flower/datasets/flwr_datasets/utils.py
Lines 82 to 88 in dcffb48
Steps/Code to Reproduce
I am using the following code (a custom dataset loaded with pandas):
Expected Results
No warning for custom datasets.
Actual Results
In this case the dataset is a custom dataset, but I am receiving the following warning (features were redacted):
The text was updated successfully, but these errors were encountered: