Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize benchmark creation helpers #1397

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

lrzpellegrini
Copy link
Collaborator

@lrzpellegrini lrzpellegrini commented May 31, 2023

This PR introduces the ability to customize the class of the benchmark objects returned by generic benchmark creation functions (such as dataset_benchmark, paths_benchmark, ...).

In addition, new problem-specific functions (dataset_classification_benchmark, dataset_detection_benchmark, ...) are introduced to explicitly cover classification and detection setups. Generic functions will still return ClassificationScenario instances if datasets with classification targets are detected (ints), but they now will display a warning suggesting the use of its classification counterpart.

The `_scenario functions, which were deprecated quite a long time ago, have been removed.

This PR also fixes a problem with CORe50-NC, which did not have fields found in NCScenario.

Minor addition: this also solves some warnings raised by sphinx when generating the doc files. The structure of some macro-sections has been slightly reworked.

Fixes #774

@coveralls
Copy link

coveralls commented May 31, 2023

Pull Request Test Coverage Report for Build 5268393053

  • 417 of 538 (77.51%) changed or added relevant lines in 30 files are covered.
  • 43 unchanged lines in 5 files lost coverage.
  • Overall coverage increased (+0.7%) to 73.292%

Changes Missing Coverage Covered Lines Changed/Added Lines %
avalanche/benchmarks/classic/endless_cl_sim.py 2 3 66.67%
avalanche/benchmarks/classic/openloris.py 2 3 66.67%
avalanche/benchmarks/classic/stream51.py 3 4 75.0%
avalanche/benchmarks/generators/benchmark_generators.py 14 15 93.33%
avalanche/benchmarks/scenarios/classification_benchmark_creation.py 15 16 93.75%
avalanche/benchmarks/scenarios/detection_benchmark_creation.py 13 14 92.86%
avalanche/benchmarks/utils/classification_dataset.py 18 19 94.74%
tests/distributed/test_distributed_helper.py 1 2 50.0%
avalanche/benchmarks/classic/clear.py 3 5 60.0%
avalanche/benchmarks/classic/ctrl.py 1 4 25.0%
Files with Coverage Reduction New Missed Lines %
tests/benchmarks/test_flat_data.py 1 99.39%
avalanche/benchmarks/scenarios/generic_benchmark_creation.py 2 83.93%
avalanche/benchmarks/utils/detection_dataset.py 2 41.82%
avalanche/benchmarks/utils/utils.py 17 73.2%
avalanche/benchmarks/utils/flat_data.py 21 90.94%
Totals Coverage Status
Change from base Build 5257858549: 0.7%
Covered Lines: 16586
Relevant Lines: 22630

💛 - Coveralls

@lrzpellegrini lrzpellegrini marked this pull request as ready for review May 31, 2023 13:25
@AntonioCarta
Copy link
Collaborator

AntonioCarta commented May 31, 2023

IMO reusing the same organization of the classification generators is a bit clunky for the generic ones. If we assume that the input is an AvalancheDataset we can remove the transform arguments. IMO the generic generators should:

  • create a stream/benchmark from a list of AvalancheDatasets
  • create a stream/benchmark given an AvalancheDataset and an attribute to use for splitting it (e.g. a domain label which is a DataAttribute)

The usage should go like this:

train_data, test_data = my_custom_data()
da = DataAttribute("domain", ...)
train_data = AvalancheDataset(train_data, data_attributes=[da])
# same for test

bm = benchmark_from_dataset(train_data, test_data, split_by="domain")

Comment on lines +87 to +88
test_generator: LazyStreamDefinition,
*,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to keep the LazyStreamDefinition? maybe we should drop the LazyStreamDefinition and just use a generator of experiences now. I don't see a particular advantage to delay the stream creation this way. It's ok to keep this internal and remove later.

Comment on lines +97 to +98
def create_lazy_detection_benchmark(
train_generator: LazyStreamDefinition,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Comment on lines +289 to +290
train_datasets: Sequence[GenericSupportedDataset],
test_datasets: Sequence[GenericSupportedDataset],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we simplify this as require an AvalancheDataset? this way we remove the necessity to pass transform/task labels...

Comment on lines 143 to 144
def make_detection_dataset(
dataset: SupportedDetectionDataset,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? DetectionDataset should be enough. We have the others for classificaiton mostly to keep it compatible with the old API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also a matter of allowing:

  • On the Avalanche side, to automatically fetch the targets from the dataset
  • On the user side, to set the targets and task labels

The alternative option is setting the targets and task labels through data_attributes, but that seems an advanced topic.

Comment on lines 303 to 304
def detection_subset(
dataset: SupportedDetectionDataset,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? DetectionDataset should be enough. We have the others for classificaiton mostly to keep it compatible with the old API.

avalanche/benchmarks/utils/utils.py Outdated Show resolved Hide resolved
avalanche/benchmarks/utils/utils.py Outdated Show resolved Hide resolved
avalanche/benchmarks/utils/utils.py Outdated Show resolved Hide resolved
return found_targets


def make_generic_dataset(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need it? also, it's not generic because it's still asks for targets

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still useful to load targets and task labels if they are available. Targets and task labels are optional (it's a best effort search).

return data


def make_generic_tensor_dataset(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need it?

@AntonioCarta
Copy link
Collaborator

the same goes for the dataset methods. We don't really need those. Instead of doing

data = make_generic_tensor_data(...)

I think it's easier to do

data = TensorDataset(...) # pytorch data
data = AvalancheDataset(data)

We don't need to know how the dataset is made internally.

@ContinualAI-bot
Copy link
Collaborator

Oh no! It seems there are some PEP8 errors! 😕
Don't worry, you can fix them! 💪
Here's a report about the errors and where you can find them:

avalanche/logging/wandb_logger.py:246:21: E127 continuation line over-indented for visual indent
avalanche/logging/wandb_logger.py:247:21: E127 continuation line over-indented for visual indent
2       E127 continuation line over-indented for visual indent

@ContinualAI-bot
Copy link
Collaborator

Oh no! It seems there are some PEP8 errors! 😕
Don't worry, you can fix them! 💪
Here's a report about the errors and where you can find them:

avalanche/logging/wandb_logger.py:246:21: E127 continuation line over-indented for visual indent
avalanche/logging/wandb_logger.py:247:21: E127 continuation line over-indented for visual indent
2       E127 continuation line over-indented for visual indent

@lrzpellegrini
Copy link
Collaborator Author

I agree with most of the things you pointed out. I was hoping to keep the score of this PR more narrow 😅.

  • I agree that we can remove some generic benchmark constructors (like the tensor one). However, they should undergo a deprecation cycle before completely removing them. Users may be using them.
  • As for transformation parameters, I think it's better if we keep the classic train_transform and eval_transform parameters. The alternative way is populating the train end eval transformation groups of the datasets before calling the constructor, but this requires knowing what a transformation group is (I suspect most users don't know about them).
  • We can remove the lazy constructors (or create all experiences eagerly internally), but that will prevent users from creating experience datasets on the go. In some setups, this may be a problem memory-wise.

@AntonioCarta
Copy link
Collaborator

I agree that we can remove some generic benchmark constructors (like the tensor one). However, they should undergo a deprecation cycle before completely removing them. Users may be using them.

I don't understand. We don't have generic benchmark now. We only have the classification benchmarks, which we use for everything.

As for transformation parameters, I think it's better if we keep the classic train_transform and eval_transform parameters. The alternative way is populating the train end eval transformation groups of the datasets before calling the constructor, but this requires knowing what a transformation group is (I suspect most users don't know about them).

I agree. Maybe we should fix this and have a simpler API to manage transformations? Combining it with the benchmark creation doesn't seem like a great choice for the generic benchmarks.

We can remove the lazy constructors (or create all experiences eagerly internally), but that will prevent users from creating experience datasets on the go. In some setups, this may be a problem memory-wise.

We can create lazy streams of experiences, like we do for OCL, where experiences themselves are created on-the-fly.

@lrzpellegrini
Copy link
Collaborator Author

We have DatasetScenario, which is quite generic. The ClassificationScenario is a subclass of DatasetScenario with a couple of additional fields regarding the classes timeline.

DatasetScenario -> ClassesTimelineCLScenario (adds class timeline fields) -> ClassificationScenario

similarly, for detection:

DatasetScenario -> ClassesTimelineCLScenario -> DetectionScenario

In theory, one could create a regression/"other problem" benchmark using DatasetScenario instead of ClassificationScenario, which is more correct.

@AntonioCarta
Copy link
Collaborator

We have DatasetScenario, which is quite generic

This is a class, I'm referring to benchmark generators. We don't have generic ones because they all expect targets and task labels.

In theory, one could create a regression/"other problem" benchmark using DatasetScenario instead of ClassificationScenario, which is more correct.

This is the kind of API that I want to avoid. For every type of dataset/problem and every type of CL scenario we will need a different generator, or a super general and super complex one.

ClassesTimelineCLScenario (adds class timeline fields)

keep in mind that we don't really have a use case for timeline fields. I think it's best to keep the benchmark itself as simple as possible (i.e. a dumb collection of streams). Same for streams (a dumb collection of experiences).

The idea of splitting dataset and stream creation tries to reduce this complexity by splitting separate concerns in different methods. I'm open to other proposals, but only in the direction of reducing complexity.

@lrzpellegrini
Copy link
Collaborator Author

I think it may be a good idea to re-design the benchmark generation part a bit. I'm putting this PR on hold until we devise a more general solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Core50 benchmark with ICARL strategy
4 participants