Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Documentation & Typos #1629

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jneuendorf
Copy link

@jneuendorf jneuendorf commented Mar 21, 2024

Still PR is still in progress and not "clean for review" - just some things, I stumbled across. I opened it already, so you know there is some progress on certain aspects that others don't have to worry about 😉

Relates to #886

@AntonioCarta
Copy link
Collaborator

Thank you! ping me once you are ready for a review.

@jneuendorf jneuendorf marked this pull request as draft March 23, 2024 13:37
@jneuendorf
Copy link
Author

jneuendorf commented Mar 23, 2024

@AntonioCarta Thanks for the quick reply!

I have a question not directly related to documentation: In a codebase using a past avalanche version there is this import from avalanche.benchmarks.utils import make_classification_dataset. From what I could see in the git history, this function was deprecated and no longer exists in avalanche. What is the equivalent in the current version (0.5.0)?

from avalanche.benchmarks.utils import as_classification_dataset seems to be the successor. However, it's not quite clear to me how it can be used equivalently. _init_transform_groups was used in make_classification_dataset and is still a "private" function in the current code, but there is no public function that uses the private one (only private ones, e.g _make_taskaware_classification_dataset). So the question is:

What's the reason that only transform_groups are passed to as_classification_dataset instead of all the kwargs of make_classification_dataset? Would you be fine with reintroducing the former kwargs and calling _init_transform_groups? Otherwise, I think the best alternative is to make _init_transform_groups public so users can call

as_classification_dataset(
  dataset,
  transform_groups=init_transform_groups(
    target_transform=lambda x: x,  # or whatever
    ...
  ),
)

This could also be achieved via a class method (the constructor seems a bit complex already), e.g.

as_classification_dataset(
  dataset,
  transform_groups=TransformGroups.create(
    target_transform=lambda x: x,  # or whatever
    ...
  ),
)

Kind regards 🙂

PS: Providing a dict to as_classification_dataset as in the docs does not comply to the TransformGroup type.

@AntonioCarta
Copy link
Collaborator

Thank you for highlighting these issues. The methods changed over time because at first AvalancheDataset only supported classification datasets. Now, a classification dataset is an AvalancheDataset where:

  • the data returns triplets <x, y, t>
  • it has a targets DataAttribute
  • it has a targets_task_labels DataAttribute

Therefore, the goal of as_classification_dataset should be to make sure that the data has the correct attributes that are expected from classification datasets. Transformation groups in theory should be defined before this step. But I agree that it may be useful to define them at this stage as long as it's not too complex or confusing.

Would you be fine with reintroducing the former kwargs and calling _init_transform_groups? Otherwise, I think the best alternative is to make _init_transform_groups public so users can call

I think it makes sense to add the arguments to as_classification_dataset. Transformation groups are a bit verbose to create from scratch.

Otherwise, I think the best alternative is to make _init_transform_groups

I like this less. Classification transform groups are different from other transformations and TransformGroups tries to be agnostic to these differences (as much as possible). Also, It would not solve the verbosity problem.

PS: Providing a dict to as_classification_dataset as in the docs does not comply to the TransformGroup type.

This is probably a mistake from the earlier API version.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 8382229381

Details

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 51.806%

Totals Coverage Status
Change from base Build 8098020118: 0.0%
Covered Lines: 14756
Relevant Lines: 28483

💛 - Coveralls

@AntonioCarta
Copy link
Collaborator

btw, if you have other comments about usability/pain point in the API/documentation do let me know. Fixing these issues takes a lot of time so we can't do everything quickly, but we try to keep track of it and improve over time.

@AntonioCarta
Copy link
Collaborator

Hi @jneuendorf I see that the PR is still a draft but if there are no blocking issue I think we could merge it. Doc improvements can be easily split into multiple PRs.

@jneuendorf
Copy link
Author

Hi @jneuendorf I see that the PR is still a draft but if there are no blocking issue I think we could merge it. Doc improvements can be easily split into multiple PRs.

Hi, your Github-comment email got lost between the other ones from Github like coverall etc. 🙈 You're correct in that some things could already be merged because later amendments could go into another PR.

So feel free to review/cherry-pick the changes and decide which ones are ok for you. 😉

@jneuendorf
Copy link
Author

jneuendorf commented May 23, 2024

I currently have a problem with training a benchmark from a custom dataset:

My custom dataset is a PyTorch TensorDataset that implements ISupportedClassificationDataset, i.e. the targets method.

class ClassificationMixin(torch.utils.data.Dataset, ISupportedClassificationDataset):
    @cached_property
    def targets(self):
        return []  # whatever...

class TensorDataset(torch.utils.data.TensorDataset, ClassificationMixin):
    ...

This way, it (and its modified splits) is compatible with as_classification_dataset(...). However, when I try to train the following benchmark

benchmark = class_incremental_benchmark(
    {
        'train': as_classification_dataset(train_dataset),
        'test': as_classification_dataset(test_dataset),
    },
    num_experiences=1,
)

I get the following error:

...
    cl_strategy.train(experience)
  File "avalanche/avalanche/training/templates/base_sgd.py", line 211, in train
    super().train(experiences, eval_streams, **kwargs)
  File "avalanche/avalanche/training/templates/base.py", line 162, in train
    self._before_training_exp(**kwargs)
  File "avalanche/avalanche/training/templates/base_sgd.py", line 291, in _before_training_exp
    self.make_train_dataloader(**kwargs)
  File "avalanche/avalanche/training/templates/base_sgd.py", line 456, in make_train_dataloader
    self.dataloader = TaskBalancedDataLoader(
                      ^^^^^^^^^^^^^^^^^^^^^^^
  File "avalanche/avalanche/benchmarks/utils/data_loader.py", line 404, in __init__
    task_labels_field = getattr(data, "targets_task_labels")
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ClassificationDataset' object has no attribute 'targets_task_labels'

For me, this is a hint that there is some protocol mismatch: class_incremental_benchmark accepts a dataset created by as_classification_dataset but its training doesn't work. So am I correct that

  • either ISupportedClassificationDataset (and/or similar) should also required a targets_task_labels method
  • or the base SGD should not rely on a data loader that depends on targets_task_labels

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants