-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Update datasets to include download, dataset metaclass, dataset objects #299
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make a more thorough review later but it looks great at first glance! Let's use pathlib instead of os.path.
downloaded_files = os.listdir(test_data_dir) | ||
for split in ['train','test']: | ||
for res in [resolution, resolution*2]: | ||
assert f"darcy_{split}_{res}.pt" in downloaded_files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something more than checking for existence? An MD5 check would be reassuring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MD5 check happens in the course of downloading
# TODO @Jean: do you think there's a better name for this base class? | ||
# i/o pairs of 'x', 'y' stored as .pt files, but the distinguishing feature of this dataset | ||
# isn't the file format, it's the attributes in its data batches | ||
class PTDataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: better name
Could we reuse the old name "[Base]PDEDataset"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might prefer PDEDataset too, let me see if I can easily turn Burger's into this form too
Connected to #280: we previously discussed adding
torchvision
-style dataset files that include the ability to download data for sample problems, e.g. Navier-Stokes or Darcy flow. This PR will add this functionality as well as:PTDataset
classAt this stage it's very open to feedback. Would love to hear what people think of the spec so far/ideas for how to improve it!
Currently I have a working
DarcyDataset
that pulls data from a Zenodo archive, linked here.