Facing issue with Flower Simulation with ResNet18 and MNIST dataset #3237

EzyHow opened this issue Apr 8, 2024 · 3 comments

bug Something isn't working


EzyHow commented Apr 8, 2024

Describe the bug

I was trying a example project of Flower Simulation (Flower Simulation Step by Step Pytorch - Part II). Everything went very well until I tried to change the model to resnet18 as given below:

class Net(nn.Module):
    def __init__(self, num_classes: int) -> None:
        super(Net,` self).__init__()
        self.model = models.resnet18()
        for param in self.model.parameters():
            param.requires_grad = False
        self.model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
        num_ftrs = self.model.fc.in_features
        self.model.fc = nn.Linear(num_ftrs, num_classes)
        summary(self.model, input_size=(1, 28, 28)) # <<== THIS LINE

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.model(x)
        return x

If I add summary(self.model, input_size=(1, 28, 28)) at the end of __init__() method, everything works. But when I remove it, I get error: input_param = input_param[0] IndexError: index 0 is out of bounds for dimension 0 with size 0 in evaluate_fn of

params_dict = zip(model.state_dict().keys(), parameters)
state_dict = OrderedDict({k: torch.Tensor(v) for k, v in params_dict})
model.load_state_dict(state_dict, strict=True) # <= At this line I'm getting error

Steps/Code to Reproduce

Clone the repository from Flower Simulation Step by Step Pytorch Part-II and follow instructions to setup the environment.

Then change the model to resnet18 in file as given below:

import torch
import torch.nn as nn
import torchvision.models as models
from flwr.common.parameter import ndarrays_to_parameters
from torchsummary import summary

class Net(nn.Module):
    def __init__(self, num_classes: int) -> None:
        super(Net, self).__init__()

        self.model = models.resnet18()
        for param in self.model.parameters():
            param.requires_grad = False
        self.model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
        num_ftrs = self.model.fc.in_features
        self.model.fc = nn.Linear(num_ftrs, num_classes)
        summary(self.model, input_size=(1, 28, 28))

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.model(x)
        return x

Following is the list of packages installed in the conda environment:

requirement.txt file

flwr[simulation]>=1.0, <2.0

Expected Results

Following is the output when it runs successfully (by adding line summary(self.model, input_size=(1, 28, 28))) :

{'history': History (loss, distributed): round 1: 6.738090056180954 round 2: 3.8934330970048903 History (loss, centralized): round 0: 366.1482033729553 round 1: 97.4027541577816 round 2: 52.76616382226348 History (metrics, centralized): {'accuracy': [(0, 0.1086), (1, 0.8021), (2, 0.8959)]}

Actual Results

When I remove line summary(self.model, input_size=(1, 28, 28)), I get following error:

[2024-04-08 09:43:34,760][flwr][INFO] - Initializing global parameters
[2024-04-08 09:43:34,761][flwr][INFO] - Requesting initial parameters from one random client
[2024-04-08 09:43:37,337][flwr][INFO] - Received initial parameters from one random client
[2024-04-08 09:43:37,338][flwr][INFO] - Evaluating initial parameters
[2024-04-08 09:43:37,644][flwr][ERROR] - index 0 is out of bounds for dimension 0 with size 0
[2024-04-08 09:43:37,646][flwr][ERROR] - Traceback (most recent call last):
  File "/root/miniconda3/envs/flower_env/lib/python3.9/site-packages/flwr/simulation/", line 308, in start_simulation
    hist = run_fl(
  File "/root/miniconda3/envs/flower_env/lib/python3.9/site-packages/flwr/server/", line 225, in run_fl
    hist =, timeout=config.round_timeout)
  File "/root/miniconda3/envs/flower_env/lib/python3.9/site-packages/flwr/server/", line 92, in fit
    res = self.strategy.evaluate(0, parameters=self.parameters)
  File "/root/miniconda3/envs/flower_env/lib/python3.9/site-packages/flwr/server/strategy/", line 165, in evaluate
    eval_res = self.evaluate_fn(server_round, parameters_ndarrays, {})
  File "/root/development/machine-learning-project/", line 42, in evaluate_fn
    model.load_state_dict(state_dict, strict=True)
  File "/root/miniconda3/envs/flower_env/lib/python3.9/site-packages/torch/nn/modules/", line 1657, in load_state_dict
    load(self, state_dict)
  File "/root/miniconda3/envs/flower_env/lib/python3.9/site-packages/torch/nn/modules/", line 1645, in load
    load(child, child_state_dict, child_prefix)
  File "/root/miniconda3/envs/flower_env/lib/python3.9/site-packages/torch/nn/modules/", line 1645, in load
    load(child, child_state_dict, child_prefix)
  File "/root/miniconda3/envs/flower_env/lib/python3.9/site-packages/torch/nn/modules/", line 1639, in load
  File "/root/miniconda3/envs/flower_env/lib/python3.9/site-packages/torch/nn/modules/", line 110, in _load_from_state_dict
    super(_NormBase, self)._load_from_state_dict(
  File "/root/miniconda3/envs/flower_env/lib/python3.9/site-packages/torch/nn/modules/", line 1562, in _load_from_state_dict
    input_param = input_param[0]
IndexError: index 0 is out of bounds for dimension 0 with size 0

[2024-04-08 09:43:37,648][flwr][ERROR] - Your simulation crashed :(. This could be because of several reasons. The most common are: 
	 > Sometimes, issues in the simulation code itself can cause crashes. It's always a good idea to double-check your code for any potential bugs or inconsistencies that might be contributing to the problem. For example: 
		 - You might be using a class attribute in your clients that hasn't been defined.
		 - There could be an incorrect method call to a 3rd party library (e.g., PyTorch).
		 - The return types of methods in your clients/strategies might be incorrect.
	 > Your system couldn't fit a single VirtualClient: try lowering `client_resources`.
	 > All the actors in your pool crashed. This could be because: 
		 - You clients hit an out-of-memory (OOM) error and actors couldn't recover from it. Try launching your simulation with more generous `client_resources` setting (i.e. it seems {'num_cpus': 1, 'num_gpus': 0.0} is not enough for your run). Use fewer concurrent actors. 
		 - You were running a multi-node simulation and all worker nodes disconnected. The head node might still be alive but cannot accommodate any actor with resources: {'num_cpus': 1, 'num_gpus': 0.0}.
Take a look at the Flower simulation examples for guidance <>.
Hi @EzyHow, have you added that summary(self.model, input_size=(1, 28, 28)) somewhere else? maybe also in the evaluation in I wonder if torchsummary is adding something to the state_dict...

EzyHow commented Apr 8, 2024

Flower Simulation Step by Step Pytorch Part-II

Kindly check this repository for detailed code: Testing Flower Simulation

In this repository, please go through the main.log files for three different scenarios given in output directory.

I encountered the same issue and found a solution. I noticed the ndarrays_to_model function in src/ The relevant code is:

def ndarrays_to_model(model: torch.nn.ModuleList, params: List[np.ndarray]):
    """Set model weights from a list of NumPy ndarrays."""
    params_dict = zip(model.state_dict().keys(), params)
    state_dict = OrderedDict({k: torch.from_numpy(np.copy(v)) for k, v in params_dict})
    model.load_state_dict(state_dict, strict=True)

Therefore, I changed

state_dict = OrderedDict({k: torch.Tensor(v) for k, v in params_dict})


state_dict = OrderedDict({k: torch.from_numpy(np.copy(v)) for k, v in params_dict})

in set_parameters function on and evaluate_fn in Please also import numpy:

import numpy as np

I hope it will work for you as well.

