Table notation for reproducibility #35

AntonioCarta · 2022-12-07T10:01:31Z

I propose to switch the notation. Right now we have:

✅ Reproduced
❌ Custom setup
bug for bugs

IMO, this is very confusing at a first glance. If I see a big red cross I immediately think there is a problem with the strategy. In this case, everything is actually correct, we just changed some hyperparameters or tested a new benchmark.

Instead we could have two separate columns:

Reproduced: ✅ if correct, ❌ if bugged
Reference: link to the paper, or link to avalanche or custom tag if not using any paper.

The text was updated successfully, but these errors were encountered:

AndreaCossu · 2022-12-07T10:13:30Z

The current meaning is actually different:
tick = we are able to reproduce the target performance of the reference paper (we do not necessarily use the same setup of the reference paper)
cross = we are not able to reproduce the target performance of the reference paper. We do not know if this is due to a bug in the strategy.
bug = we are not able to reproduce the target performance of the reference paper due to a bug in the strategy, for sure.

AntonioCarta · 2022-12-07T10:20:23Z

Ok, I misunderstood the notation. Maybe we should add how far we are from the target result?

AndreaCossu · 2022-12-07T10:24:43Z

Yes, we can. I didn't want to clutter the table so I put the reference performance inside the comments in the experiments.
I think we could create a separate table in the README to briefly show the gap.
I also created issue #33 to keep track of what's missing. I could also add the gap there.

AntonioCarta · 2022-12-16T09:36:11Z

Maybe we need to strictly separate two types of experiments:

paper reproductions which are exactly reproducing a paper
baselines which provides clean implementations but may have a lower accuracy.

IMO CLB is still valuable as long as the methods in avalanche are correct and the clean implementation provides a reasonable reference value. Reproducing papers requires digging into whatever tricks the authors decided to add. While useful, it's very time consuming and we cannot afford to do it ourselves, as we have already seen. Of course we can support external contributions on this.

AndreaCossu · 2022-12-16T15:45:50Z

With paper reproductions do you also mean same hyperparameters as original paper? In the end, I think that is less interesting (and we would have only few strategies marked as such). One would probably use CL baselines to understand how to reach the same performance as the original paper, even though hyperparameters may differ. I guess that better describes the concept of reproducibility when you use a different codebase than the one you are trying to reproduce.

AntonioCarta · 2022-12-19T09:53:31Z

With paper reproductions do you also mean same hyperparameters as original paper?

Same performance, scenario, model architectures, and so on. Some hyperparameters (lr, regularization strength) may change due to minor differences in the framework/implementation.

AndreaCossu · 2023-03-16T12:44:50Z

I changed the table in the README. It now shows Avalanche when the experiment is not present in a specific paper. I also added the reference performance with the related paper (when available).

AntonioCarta · 2023-03-16T12:59:41Z

This is a nice improvement. Do we have any explanation about the gaps of some experiments? e.g. different hparams, less epochs,...

AndreaCossu · 2023-03-16T13:05:58Z

Not really, we can speculate but nothing more at the moment.

AntonioCarta · 2023-03-16T16:45:36Z

It's fine, but we should keep track of this somewhere. At least a log with attempts, some notes about what failed. Not sure about the form of it, a comment in the header of the script may be enough.

For example, maybe we find out that the difference is due to a mistake in the original paper (e.g. they look at the validation instead of test loss). In this case, we should explain the reason behind the performance difference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table notation for reproducibility #35

Table notation for reproducibility #35

AntonioCarta commented Dec 7, 2022 •

edited

AndreaCossu commented Dec 7, 2022

AntonioCarta commented Dec 7, 2022

AndreaCossu commented Dec 7, 2022

AntonioCarta commented Dec 16, 2022

AndreaCossu commented Dec 16, 2022

AntonioCarta commented Dec 19, 2022

AndreaCossu commented Mar 16, 2023

AntonioCarta commented Mar 16, 2023 •

edited

AndreaCossu commented Mar 16, 2023

AntonioCarta commented Mar 16, 2023

Table notation for reproducibility #35

Table notation for reproducibility #35

Comments

AntonioCarta commented Dec 7, 2022 • edited

AndreaCossu commented Dec 7, 2022

AntonioCarta commented Dec 7, 2022

AndreaCossu commented Dec 7, 2022

AntonioCarta commented Dec 16, 2022

AndreaCossu commented Dec 16, 2022

AntonioCarta commented Dec 19, 2022

AndreaCossu commented Mar 16, 2023

AntonioCarta commented Mar 16, 2023 • edited

AndreaCossu commented Mar 16, 2023

AntonioCarta commented Mar 16, 2023

AntonioCarta commented Dec 7, 2022 •

edited

AntonioCarta commented Mar 16, 2023 •

edited