Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with federate.method set to global #771

Open
KKNakkav2 opened this issue Apr 29, 2024 · 1 comment
Open

Issue with federate.method set to global #771

KKNakkav2 opened this issue Apr 29, 2024 · 1 comment

Comments

@KKNakkav2
Copy link

Hello,

I have launched the experiment with command

python federatedscope/main.py --cfg federatedscope/cv/baseline/fedavg_convnet2_on_cifar10.yaml federate.client_num 1 federate.sample_client_rate 1.0 federate.method global

However, it looks the model is not updated during the evaluation time. The test accuracy stays at 11% for all the rounds while the training accuracy improves.

2024-04-29 16:42:48,817 (client:357) INFO: {'Role': 'Client #1', 'Round': 0, 'Results_raw': {'train_avg_loss': 1.446278, 'train_total': 50000, 'train_acc': 0.48616, 'train_correct': 24308.0, 'train_loss': 72313.922022}}
2024-04-29 16:42:48,820 (server:344) INFO: Server: Starting evaluation at the end of round 0.
2024-04-29 16:42:50,443 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:42:50,445 (server:960) INFO: {'Role': 'Server #', 'Round': 1, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:42:50,445 (server:350) INFO: ----------- Starting a new training round (Round #1) -------------
2024-04-29 16:42:59,204 (client:357) INFO: {'Role': 'Client #1', 'Round': 1, 'Results_raw': {'train_avg_loss': 1.096129, 'train_total': 50000, 'train_acc': 0.6146, 'train_correct': 30730.0, 'train_loss': 54806.453947}}
2024-04-29 16:42:59,206 (server:344) INFO: Server: Starting evaluation at the end of round 1.
2024-04-29 16:43:00,697 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:00,697 (server:960) INFO: {'Role': 'Server #', 'Round': 2, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:43:00,697 (server:350) INFO: ----------- Starting a new training round (Round #2) -------------
2024-04-29 16:43:09,473 (client:357) INFO: {'Role': 'Client #1', 'Round': 2, 'Results_raw': {'train_avg_loss': 0.959477, 'train_total': 50000, 'train_acc': 0.66432, 'train_correct': 33216.0, 'train_loss': 47973.853004}}
2024-04-29 16:43:09,474 (server:344) INFO: Server: Starting evaluation at the end of round 2.
2024-04-29 16:43:11,000 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:11,000 (server:960) INFO: {'Role': 'Server #', 'Round': 3, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:43:11,000 (server:350) INFO: ----------- Starting a new training round (Round #3) -------------
2024-04-29 16:43:19,756 (client:357) INFO: {'Role': 'Client #1', 'Round': 3, 'Results_raw': {'train_avg_loss': 0.867314, 'train_total': 50000, 'train_acc': 0.6992, 'train_correct': 34960.0, 'train_loss': 43365.681585}}
2024-04-29 16:43:19,757 (server:344) INFO: Server: Starting evaluation at the end of round 3.
2024-04-29 16:43:21,245 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:21,245 (server:960) INFO: {'Role': 'Server #', 'Round': 4, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:43:21,246 (server:350) INFO: ----------- Starting a new training round (Round #4) -------------
2024-04-29 16:43:29,954 (client:357) INFO: {'Role': 'Client #1', 'Round': 4, 'Results_raw': {'train_avg_loss': 0.794839, 'train_total': 50000, 'train_acc': 0.72466, 'train_correct': 36233.0, 'train_loss': 39741.947819}}
2024-04-29 16:43:29,956 (server:344) INFO: Server: Starting evaluation at the end of round 4.
2024-04-29 16:43:31,466 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:31,466 (server:960) INFO: {'Role': 'Server #', 'Round': 5, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:43:31,467 (server:350) INFO: ----------- Starting a new training round (Round #5) -------------
2024-04-29 16:43:40,151 (client:357) INFO: {'Role': 'Client #1', 'Round': 5, 'Results_raw': {'train_avg_loss': 0.730278, 'train_total': 50000, 'train_acc': 0.74824, 'train_correct': 37412.0, 'train_loss': 36513.923683}}
2024-04-29 16:43:40,153 (server:344) INFO: Server: Starting evaluation at the end of round 5.
2024-04-29 16:43:41,618 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:41,619 (server:960) INFO: {'Role': 'Server #', 'Round': 6, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:43:41,619 (server:350) INFO: ----------- Starting a new training round (Round #6) -------------
2024-04-29 16:43:50,265 (client:357) INFO: {'Role': 'Client #1', 'Round': 6, 'Results_raw': {'train_avg_loss': 0.671751, 'train_total': 50000, 'train_acc': 0.77108, 'train_correct': 38554.0, 'train_loss': 33587.532097}}
2024-04-29 16:43:50,266 (server:344) INFO: Server: Starting evaluation at the end of round 6.
2024-04-29 16:43:51,771 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:51,771 (server:960) INFO: {'Role': 'Server #', 'Round': 7, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
@KKNakkav2
Copy link
Author

KKNakkav2 commented Apr 29, 2024

I think I have found one reason for this behaviour.

If the federate.method is set to global, there is no model_para broadcast (see the workers/server,py file) to the single client (worker idx is 1 I think) where the local training happens. Moreover, since the merge_test_data is set to True and make_global_eval is also set to True, the evaluation happens on the server (worker idx 0) which has never received the updated model.

I think if the method is set to global, possibly we should not activate the merge_test_data or make_global_eval. Please correct me. The same reasoning applies in the case when federate.method is set to local since there is also no broadcast in this setting as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant