After passing the TRAIN function, the model parameters were not updated, resulting in the final evaluation criteria output of the TEST function being the same as the initial stage #3164
Labels
bug
Something isn't working
Describe the bug
In my train function, I saved the optimal model parameters in checkpoint.pth by the following way, which led me to get the model in the test function are the model parameters of the initial stage, and did not receive the model parameters of the parameter update, the training of the final output and the initial stage of the same, here is not my set_parameters() function and the get_parameters() function and this way of saving model parameters, how can I solve it?
Steps/Code to Reproduce
early_stopping(vali_loss, self.model, path)
if early_stopping.early_stop:
print("Early stopping")
break
best_model_path = path + '/' + 'checkpoint.pth'
self.model.load_state_dict(torch.load(best_model_path))
def test(self, model):
test_data, test_loader = self._get_data(flag='test')
train_data, train_loader = self._get_data(flag='train')
test_steps = len(train_loader)
criterion = self._select_criterion()
preds = []
trues = []
inputx = []
self.model.eval()
def set_parameters(self, parameters):
params_dict = zip(self.model.state_dict().keys(), parameters)
state_dict = OrderedDict({k: torch.Tensor(v) for k, v in params_dict})
# now replace the parameters
self.model.load_state_dict(state_dict, strict=True)
# print("Parameters set successfully:", state_dict)
Expected Results
My hope is that the test function receives the model parameters as trained by the train function, and the final output should be evaluated on different criteria than the first time around
Actual Results
DEBUG flwr 2024-03-20 23:38:35,516 | server.py:187 | evaluate_round 1 received 3 results and 0 failures
WARNING flwr 2024-03-20 23:38:35,516 | fedavg.py:273 | No evaluate_metrics_aggregation_fn provided
INFO flwr 2024-03-20 23:38:35,517 | server.py:153 | FL finished in 4911.190346067073
INFO flwr 2024-03-20 23:38:35,597 | app.py:226 | app_fit: losses_distributed [(1, nan)]
INFO flwr 2024-03-20 23:38:35,597 | app.py:227 | app_fit: metrics_distributed_fit {}
INFO flwr 2024-03-20 23:38:35,597 | app.py:228 | app_fit: metrics_distributed {}
INFO flwr 2024-03-20 23:38:35,597 | app.py:229 | app_fit: losses_centralized [(0, 0.7456194370291954), (1, 0.7456194370291954)]
INFO flwr 2024-03-20 23:38:35,597 | app.py:230 | app_fit: metrics_centralized {'MAE': [(0, 0.5941068), (1, 0.5941068)], 'MSE': [(0, 0.7456194), (1, 0.7456194)], 'RMSE': [(0, 0.86349255), (1, 0.86349255)]}
The text was updated successfully, but these errors were encountered: