Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to change the total number of users in federated learning experiments? (ex) https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/simulation/datasets/emnist.py) #3910

Open
Yeojoon opened this issue May 9, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@Yeojoon
Copy link

Yeojoon commented May 9, 2023

Is your feature request related to a problem? Please describe.
I am currently running some FL experiments with emnist dataset in the tensorflow-federated library (https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/simulation/datasets/emnist.py). The default total number of users for this dataset is 3400 (when only_digits=False). Is there any way to change the number of users for a particular dataset?

If not, would it be possible to add this feature? I believe this feature can be very helpful to researchers!

Thank you for your help a lot in advance!

@Yeojoon Yeojoon added the enhancement New feature or request label May 9, 2023
@zcharles8
Copy link
Collaborator

Hi @Yeojoon. One easy way to do this would be to subsample client IDs from EMNIST. This gives you a smaller total number of clients, but also reduces the total numbers of examples seen, so it's probably not right for all settings.

A more robust way to proceed would be to use tff.simulation.datasets.TransformingClientData, which allows you to take a ClientData (eg. EMNIST) and expand each client into some number of sub-clients. This would allow you to experiment with larger population sizes.

@zcharles8
Copy link
Collaborator

If neither of these solutions are exactly what you're looking for, could you add some details about what kind of feature you're looking for?

@Yeojoon
Copy link
Author

Yeojoon commented May 9, 2023

Thank you for your quick and kind response!

What I want to do is increasing or decreasing the total number of users without changing the total number of data examples (For the emnist case, fix the total number of train examples as 341,873). So, I agree with you that the first method may not solve this problem.

Do you think I can use your second solution to solve this problem?

@zcharles8
Copy link
Collaborator

Could you provide a bit more detail here? How were you hoping to do this "re-partitioning", where the number of examples is fixed and the number of clients varies?

Note that [tff.simulation.datasets.TransformingClientData](https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/TransformingClientData) would allow increasing the number of clients, but would also increase the number of examples (essentially, each client would have their dataset "transformed" some number of times).

@Yeojoon
Copy link
Author

Yeojoon commented May 9, 2023

Do you mean the second solution increases the total number of examples?

I mean I would like to randomly choose the total number of users. Let's say # of total users = n. Then, for the emnist case, the number of train data in each user should be 341,873/n.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants