Git Merge

Implementation in pytorch of the weight matching algorithm presented in Git Re-Basin: Merging Models modulo Permutation Symmetries"

description

This project is part of my final exam for the Neural Network 2023 class from Sapienza university.

This repository contains a main notebook, some models weight and a some object. The use of the object will be explain in the main notebook GIT_MERGE. The notebook contains also a small report and some personal consideration.

The solution implemented are original (no reference to the original code base) and fully working over the main problem

Usage

The notebook contains 2 main section:

the definition of some functions
the experiment part

More in details, the first part contains the following section:

import the lib [import lib]
download the datasets [Dataset and Dataloader]
create the Network class [Models definiton & utility]
define all the functions for the permutation part [Permutation utility]

After this, it's possible train any desired MLP (as show in the experiment part) and merge with another MLP with the same architecture. I think about 2 principal way for use the permutation utility:

Evaluation of the compatibility between 2 models: (obtain explicit the permutation)

model_A, model_B = Net(), SNet()
model_A, model_B = model_A.to(device), model_B.to(device)

loss_fn = nn.CrossEntropyLoss()

#train model A
#train model B

params_A, params_B = dict(model_A.named_parameters()), dict(model_B.named_parameters())
params_A, params_B = detach_parameters(params_A), detach_parameters(params_B)

P = get_empty_P_as_dict(params_B)
A_funct = get_A_functs(params_A, params_B)
info_perm = permutation_model_info(params_B)

new_P = weight_matching(A_funct, P)

Simple interpolation of 2 models: (git merge, obtain directly a new model)

model_a = Net() 
# train model_a
model_b = Net()
# train model_b
permuted_model = git_merge(model_a, model_b, silent=True)

for evaluate the possible permutation use:

coef_vector = np.linspace(start=0.0, stop=1.0, num=10)
eval_dict_non_permuted = evaluate_interpolation(data_loaders['choosen_dataset'],
                                            loss_fn,
                                            model_A
                                            model_B, 
                                            coef_vector, device=device)

eval_dict_permuted = evaluate_interpolation(data_loaders['choosen_dataset'],
                                            loss_fn,
                                            model_A
                                            permuted_B, 
                                            coef_vector, device=device)
plot_barrier(eval_dict_non_permuted, eval_dict_permuted, coef_vector, 'title')

There is also a part for the multiple model merging, but i was unable to reproduce the performance of the paper over this part

Ps: the dataloaders and the datasets are inside some dictionary, check the functions body for understand if the input should be a dictionary (run on both train and test set) or should be a single dataloader

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
models_weight		models_weight
reference_paper		reference_paper
GIT_MERGE.ipynb		GIT_MERGE.ipynb
NN_2022_2012633_DiMarzo.ipynb		NN_2022_2012633_DiMarzo.ipynb
README.md		README.md
loss_grid.obj		loss_grid.obj
original_weight_matching.txt		original_weight_matching.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models_weight

models_weight

reference_paper

reference_paper

GIT_MERGE.ipynb

GIT_MERGE.ipynb

NN_2022_2012633_DiMarzo.ipynb

NN_2022_2012633_DiMarzo.ipynb

README.md

README.md

loss_grid.obj

loss_grid.obj

original_weight_matching.txt

original_weight_matching.txt

Repository files navigation

Git Merge

description

Usage

About

Releases

Packages

Languages

gg-dema/Git_merge

Folders and files

Latest commit

History

Repository files navigation

Git Merge

description

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages