You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I divide a dataset with [0.2, 0.2, 0.2, 0.94], I find that the sub_datasets obtained is error.
Among the 1st, 2nd, 3rd, 4th sub_datasets, the 3rd sub_dataset has 0 samples.
Therefore, I try [0.2, 0.2, 0.2, 0.2, 0.92], and find that, the 3rd and 4th sub_datasets have 0 samples.
Finally, I find it is caused by the "_create_division_indices_ranges" function in "utils.py".
The code "start_idx += end_idx" should be "start_idx = end_idx".
partition should have 5 dataset, whose sample size are respectively 10000, 10000, 10000, 10000, 46000 (total 50000 samples),
while the results are 10000, 10000, 0, 0, ...
Actual Results
partition should have 5 dataset, whose sample size are respectively 10000, 10000, 10000, 10000, 46000 (total 50000 samples),
while the results are 10000, 10000, 0, 0, ...
The text was updated successfully, but these errors were encountered:
Hi @jmsw4bn. Thanks for pointing it out and figuring out the fix. I've opened the PR that fixes it.
As for the expected results [0.2, 0.2, 0.2, 0.94] won't be possible since the values sum up to more than 1. An error will be raised in that case. It might be confusing what should the 0.94 come from (it would have to overlap with some other parts that are expected to be separate). Alternatively, I think you might have meant 0.02, ... then it'd sum up to 1 and everything would work ok.
Hi @jmsw4bn. Thanks for pointing it out and figuring out the fix. I've opened the PR that fixes it. As for the expected results [0.2, 0.2, 0.2, 0.94] won't be possible since the values sum up to more than 1. An error will be raised in that case. It might be confusing what should the 0.94 come from (it would have to overlap with some other parts that are expected to be separate). Alternatively, I think you might have meant 0.02, ... then it'd sum up to 1 and everything would work ok.
I am sorry, I wrote the wrong values.
Actually, I test the code is with "division=[0.02, 0.02, 0.02, 0.02, 0.92]".
These values sum up to 1, and the 3rd and 4th sub_datasets have 0 samples,
you can validate the error by debuging the following codes, and the output shows the 5 sub_datasets in "partition" have 1000 1000 0 0 40000 samples respectively (the right output should be 1000 1000 1000 1000 46000):
Describe the bug
When I divide a dataset with [0.2, 0.2, 0.2, 0.94], I find that the sub_datasets obtained is error.
Among the 1st, 2nd, 3rd, 4th sub_datasets, the 3rd sub_dataset has 0 samples.
Therefore, I try [0.2, 0.2, 0.2, 0.2, 0.92], and find that, the 3rd and 4th sub_datasets have 0 samples.
Finally, I find it is caused by the "_create_division_indices_ranges" function in "utils.py".
The code "start_idx += end_idx" should be "start_idx = end_idx".
Steps/Code to Reproduce
fds = FederatedDataset(dataset="cifar10", partitioners={"train": 1})
tds = fds.load_partition(0, "train")
partition = divide_dataset(dataset=tds, division=[0.2, 0.2, 0.2, 0.2, 0.92])
Expected Results
partition should have 5 dataset, whose sample size are respectively 10000, 10000, 10000, 10000, 46000 (total 50000 samples),
while the results are 10000, 10000, 0, 0, ...
Actual Results
partition should have 5 dataset, whose sample size are respectively 10000, 10000, 10000, 10000, 46000 (total 50000 samples),
while the results are 10000, 10000, 0, 0, ...
The text was updated successfully, but these errors were encountered: