Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional arguments to override RasterDataset's filenames #1524

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

metazool
Copy link

A project that we are collaborating on has a scenario here where it would be very useful to be able to set the filename_glob and filename_regex arguments to a RasterDataset subclass when it's being initialised, rather than always depend on them being set as class attributes.

In our scenario we could be creating a PairedDataset (a RasterDataset subclass) out of many different, user-supplied RasterDataset types:

return PairedDataset(
                dataset_class,
                root=root,
                transforms=_transformations,
                **subdataset_params["params"],
            )

This PR suggests a change to replace the class attributes for filename_glob and filename_regex with optional arguments, if they're set, plus a small associated test.

@github-actions github-actions bot added datasets Geospatial or benchmark datasets testing Continuous integration testing labels Aug 17, 2023
@metazool
Copy link
Author

@microsoft-github-policy-service agree company="Ordnance Survey"

@adamjstewart
Copy link
Collaborator

We've actually had a lot of users ask for this feature. I guess my only question is, why stop at RasterDataset.filename_*? I'm wondering if other users will request the same thing for other attributes or classes.

My original thinking when I designed this was that RasterDataset and friends were basically private base classes used to define the rest of our datasets that users actually use. Since then, a lot of users have found RasterDataset useful for generic raster data and have started using that instead of defining new subclasses.

I'd have to think more about your use case, but I think it's still possible to create subclasses on-the-fly with the new filename_* attributes, but I imagine it would be easier to be able to pass these in as arguments instead. I'll have to think about whether or not a large redesign is required if we want to support this more generally for all GeoDatasets.

@metazool
Copy link
Author

Thank you for the response @adamjstewart . In our case we have a RasterDataset subclass which wraps dynamically around multiple instances of another one, which has the class attributes set as you'd expect.

You can see the current workaround for this issue here, if you're looking for usage flavours: https://github.com/Pale-Blue-Dot-97/Minerva/blob/bc6aa3433336a715d7bb0ef41404b6fe1c6773a4/minerva/datasets.py#L230

I can see the rationale for "why only filename_*" prompting contemplation of a redesign, this is intended as the smallest non-harmful change...

@calebrob6
Copy link
Member

Hey @metazool, did #1442 fix this for you?

@adamjstewart
Copy link
Collaborator

Ping @metazool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants