Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Label and feature skew Partitioner #3146

Open
WilliamLindskog opened this issue Mar 14, 2024 · 1 comment
Open

Label and feature skew Partitioner #3146

WilliamLindskog opened this issue Mar 14, 2024 · 1 comment
Labels
feature request This issue or comment suggests an additional feature.

Comments

@WilliamLindskog
Copy link
Contributor

Describe the type of feature and its functionality.

Hi there,

I've checked the documentation for datasets and open PRs and I think these partitioners would be helpful.

As in the baseline NIID-Bench, there is a partition strategy where each client gets data with a specific number of unique labels i.e. label_quantity_partitioner (only applicable for classification tasks). For such partitioner, one should be able to specify how many allotted number of labels a client is given - must be less or equal to number of unique labels in dataset.

Another partition strategy is found in the original paper - a feature distribution partition based on Gaussian Noise. Specifically, given user-defined noise level σ, we would add noises xˆ ∼ Gau(σ · i/N) for Party P_i, where Gau(σ · i/N) is a Gaussian distribution with mean 0 and variance σ · i/N.

What do you think?

Describe step by step what files and adjustments are you planning to include.

There would be a need to create two new partitioners:

  1. Label quantity partitioner
  2. Gausian noise partitioner

And also test scripts for these.

Is there something else you want to add?

N/A

@WilliamLindskog WilliamLindskog added the feature request This issue or comment suggests an additional feature. label Mar 14, 2024
@adam-narozniak
Copy link
Member

Hi @WilliamLindskog
Thanks for writing the issue. We want to support both of them.
Regarding the first Partitioner, I informally call it ClassConstrain Partitioner (I think some people call it pathological, but I saw that name used in a different context,t too). It was also used in other work. This will be supported shortly and is a current priority regarding the partitioning schemes. (There's even been an attempt to add it based on the implementation in the FedProx paper, though it does not generalize well; also, a heuristic was used there for the class choice, but we'll move to the purely probabilistic approach).

Regarding the second Partitioner. I'll move to that either directly after the ClassConstrain is done or have just one more quantity skew that works in a similar manner to ClassConstrain but additionally assigns a small certain number of other classes (not sure how it'll be parameterized = whether percentage or raw numbers). Which, in contrast, are completely zero in ClassConstrain.

I'll keep you updated. Also, please let me know if you have other partitioning schemes you think we should add and would like to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request This issue or comment suggests an additional feature.
Projects
None yet
Development

No branches or pull requests

2 participants