Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fed prox example #564

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
249 changes: 249 additions & 0 deletions examples/mnist-pytorch-fedprox/API_Example.ipynb

Large diffs are not rendered by default.

160 changes: 160 additions & 0 deletions examples/mnist-pytorch-fedprox/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@


Quickstart Tutorial PyTorch with FEDProx (MNIST)
-------------------------------------------------
This is an enhanced version of our Pytorch MNIST example that let you use the FedProx algorithm [paper](https://arxiv.org/abs/1812.06127).


This classic example of hand-written text recognition is well suited as a lightweight test when developing on FEDn in pseudo-distributed mode.
A normal high-end laptop or a workstation should be able to sustain a few clients.
The example automates the partitioning of data and deployment of a variable number of clients on a single host.
We here assume working experience with containers, Docker and docker-compose.

Prerequisites
-------------

- `Python 3.8, 3.9 or 3.10 <https://www.python.org/downloads>`__
- `Docker <https://docs.docker.com/get-docker>`__
- `Docker Compose <https://docs.docker.com/compose/install>`__

Quick start
-----------

Clone this repository, locate into this directory:

.. code-block::

git clone https://github.com/scaleoutsystems/fedn.git
cd fedn/examples/mnist-pytorch

Start a pseudo-distributed FEDn network using docker-compose:

.. code-block::

docker-compose -f ../../docker-compose.yaml up

This starts up the needed backend services MongoDB and Minio, the API Server and one Combiner.
You can verify the deployment using these urls:

- API Server: http://localhost:8092/get_controller_status
- Minio: http://localhost:9000
- Mongo Express: http://localhost:8081

Next, we will prepare the client. A key concept in FEDn is the compute package -
a code bundle that contains entrypoints for training and (optionally) validating a model update on the client.

Locate into 'examples/mnist-pytorch' and familiarize yourself with the project structure. The entrypoints
are defined in 'client/entrypoint'. The dependencies needed in the client environment are specified in
'requirements.txt'. For convenience, we have provided utility scripts to set up a virtual environment.

Start by initializing a virtual enviroment with all of the required dependencies for this project.

.. code-block::

bin/init_venv.sh

Next create the compute package and a seed model:

.. code-block::

bin/build.sh

You should now have a file 'package.tgz' and 'seed.npz' in the project folder.

Next we prepare the local dataset. For this we download MNIST data and make data partitions:

Download the data:

.. code-block::

bin/get_data


Split the data in 10 partitions:

.. code-block::

bin/split_data --n_splits=10

Data partitions will be generated in the folder 'data/clients'.

FEDn relies on a configuration file for the client to connect to the server. Create a file called 'client.yaml' with the follwing content:

.. code-block::

network_id: fedn-network
discover_host: api-server
discover_port: 8092

Make sure to move the file ``client.yaml`` to the root of the examples/mnist-pytorch folder.
To connect a client that uses the data partition ``data/clients/1/mnist.pt`` and the config file ``client.yaml`` to the network, run the following docker command:

.. code-block::

docker run \
-v $PWD/client.yaml:/app/client.yaml \
-v $PWD/data/clients/1:/var/data \
-e ENTRYPOINT_OPTS=--data_path=/var/data/mnist.pt \
--network=fedn_default \
ghcr.io/scaleoutsystems/fedn/fedn:master-mnist-pytorch run client -in client.yaml --name client1

Observe the API Server logs and combiner logs, you should see the client connecting and entering into a state asking for a compute package.

In a separate terminal, start a second client using the data partition 'data/clients/2/mnist.pt':

.. code-block::

docker run \
-v $PWD/client.yaml:/app/client.yaml \
-v $PWD/data/clients/2:/var/data \
-e ENTRYPOINT_OPTS=--data_path=/var/data/mnist.pt \
--network=fedn_default \
ghcr.io/scaleoutsystems/fedn/fedn:master-mnist-pytorch run client -in client.yaml --name client2

You are now ready to use the API to initialize the system with the compute package and seed model, and to start federated training.

- Follow the example in the `Jupyter Notebook <https://github.com/scaleoutsystems/fedn/blob/master/examples/mnist-pytorch/API_Example.ipynb>`__



Automate experimentation with several clients:
-----------------------------------------------

Now that you have an understanding of the main components of FEDn, you can use the provided docker-compose templates to automate deployment of FEDn and clients.
To start the network and attach 4 clients:

.. code-block::

docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml up --scale client=4


Access logs and validation data from MongoDB
---------------------------------------------

You can access and download event logs and validation data via the API, and you can also as a developer obtain
the MongoDB backend data using pymongo or via the MongoExpress interface:

- http://localhost:8081/db/fedn-network/

The credentials are as set in docker-compose.yaml in the root of the repository.

Adjust fed-Prox parameter μ
--------------------------------
open file: client_settings.yaml and change mu value.
If mu is set to 0 it is vanilla fedavg.

Access model updates
-----------------------

You can obtain model updates from the 'fedn-models' bucket in Minio:

- http://localhost:9000


Clean up
-----------
You can clean up by running

.. code-block::

docker-compose down
8 changes: 8 additions & 0 deletions examples/mnist-pytorch-fedprox/bin/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash
set -e

# Init seed
client/entrypoint init_seed

# Make compute package
tar -czvf package.tgz client
21 changes: 21 additions & 0 deletions examples/mnist-pytorch-fedprox/bin/get_data
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!./.mnist-pytorch/bin/python
import os

import fire
import torchvision


def get_data(out_dir='data'):
# Make dir if necessary
if not os.path.exists(out_dir):
os.mkdir(out_dir)

# Download data
torchvision.datasets.MNIST(
root=f'{out_dir}/train', transform=torchvision.transforms.ToTensor, train=True, download=True)
torchvision.datasets.MNIST(
root=f'{out_dir}/test', transform=torchvision.transforms.ToTensor, train=False, download=True)


if __name__ == '__main__':
fire.Fire(get_data)
10 changes: 10 additions & 0 deletions examples/mnist-pytorch-fedprox/bin/init_venv.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash
set -e

# Init venv
python3.10 -m venv .mnist-pytorch

# Pip deps
.mnist-pytorch/bin/pip install --upgrade pip
.mnist-pytorch/bin/pip install -e ../../fedn
.mnist-pytorch/bin/pip install -r requirements.txt
51 changes: 51 additions & 0 deletions examples/mnist-pytorch-fedprox/bin/split_data
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!./.mnist-pytorch/bin/python
import os
from math import floor

import fire
import torch
import torchvision


def splitset(dataset, parts):
n = dataset.shape[0]
local_n = floor(n/parts)
result = []
for i in range(parts):
result.append(dataset[i*local_n: (i+1)*local_n])
return result


def split(out_dir='data', n_splits=2):
# Make dir
if not os.path.exists(f'{out_dir}/clients'):
os.mkdir(f'{out_dir}/clients')

# Load and convert to dict
train_data = torchvision.datasets.MNIST(
root=f'{out_dir}/train', transform=torchvision.transforms.ToTensor, train=True)
test_data = torchvision.datasets.MNIST(
root=f'{out_dir}/test', transform=torchvision.transforms.ToTensor, train=False)
data = {
'x_train': splitset(train_data.data, n_splits),
'y_train': splitset(train_data.targets, n_splits),
'x_test': splitset(test_data.data, n_splits),
'y_test': splitset(test_data.targets, n_splits),
}

# Make splits
for i in range(n_splits):
subdir = f'{out_dir}/clients/{str(i+1)}'
if not os.path.exists(subdir):
os.mkdir(subdir)
torch.save({
'x_train': data['x_train'][i],
'y_train': data['y_train'][i],
'x_test': data['x_test'][i],
'y_test': data['y_test'][i],
},
f'{subdir}/mnist.pt')


if __name__ == '__main__':
fire.Fire(split)