Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about running vertical FL in standalone simulation #43

Open
blziz opened this issue Aug 23, 2022 · 4 comments
Open

A question about running vertical FL in standalone simulation #43

blziz opened this issue Aug 23, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@blziz
Copy link

blziz commented Aug 23, 2022

Hi, I am confused about missing_gh variable in function compute_histogram_in_a_level. When using privacy_tech=he, part of the value of missing_gh is in plaintext and the other part is in ciphertext, is this correct?
I printed a plaintext message as follows

missing_gh_data[pid] = -340.000000/757.000000;
nodes_data[nid].sum_gh_pair = -340.000000/757.000000; 
node_gh = 0.000000/0.000000;

Is this a security risk?

@blziz
Copy link
Author

blziz commented Aug 23, 2022

missing_gh_data[pid].encrypted = false;
missing_gh_data[pid].g_enc = 0, missing_gh_data[pid].h_enc = 0;

@QinbinLi
Copy link
Member

QinbinLi commented Aug 23, 2022

Hi @blziz ,

In vertical FL, since one party (i.e., the aggregator) has the labels and can compute the raw gradients locally, it does not need to compute missing_gh based on encrypted gradients. The party with the labels will not send missing_gh to others so it's secure.

@blziz
Copy link
Author

blziz commented Aug 24, 2022

Thank you! I confirmed that this situation occurs in parties without labels in vertical FL. The parameter settings are as follows

data=./dataset/test_dataset.txt
test_data=./dataset/test_dataset.txt
model_path=fedtree.model
partition_mode=vertical
n_parties=1
mode=vertical
privacy_tech=he
n_trees=40
depth=6
learning_rate=0.2
partition=1

and in homo_partition(),

for (int i = 0; i < n_parties; i++) {
        if (is_horizontal) {    ...    }
        if (!is_horizontal) {
            subsets[i].y = dataset.y;
            if(i == 0)
                subsets[i].has_label = false;
            else
                subsets[i].has_label = true;
        }
        ...
    }

@QinbinLi
Copy link
Member

Hi @blziz ,

Thanks a lot for your information! There indeed exists possible security risks. The unencrypted missing_gh is caused by the sharing of the whole tree model among all parties in vertical FL, and the unencrypted missing_gh actually leaks no more information than the model itself. We are currently working on a version without sharing the whole model which is more secure. Also, we notice the following issues.

  1. For homo_partition(), we find that the label splitting is not correct. In the simulation, when i==0 (party id = 0), it is the host party and it should have the label. Otherwise, they are guest parties and are supposed to only have features. We have fixed it.

  2. You need to set n_parties >= 2 to simulate a reasonable federated learning scenario. In vertical FL, at least one of the parties has the labels. In our simulation, party 0 has the labels and the other parties do not have the labels.

@QinbinLi QinbinLi added the enhancement New feature or request label Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants