Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

earthaccess.download() local_path parameter is typed as Optional, but doesn't have a default value #551

Open
mfisher87 opened this issue Apr 30, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@mfisher87
Copy link
Member

We think it should default to the current directory so if a user runs earthaccess.download(results) the files go somewhere somewhat intuitive.

@mfisher87 mfisher87 added the bug Something isn't working label Apr 30, 2024
@chuckwondo
Copy link
Collaborator

There is already a default, which shows up here: https://github.com/nsidc/earthaccess/blob/main/earthaccess/store.py#L467-L470

In earthaccess.download, you could just add = None to the local_path parameter to make the typing correct.

@mfisher87
Copy link
Member Author

Ah nice, even better :)

Do we like that choice (./data/YYYY-MM-DD-abcdef)? Should the name "earthaccess" be in any of the directory names to help users identify it?

The thing that rubs me the wrong way about that default is the data/ parent directory, it's not unlikely the user already has that directory and we'd be writing in to it alongside other things. I'd prefer something more esoteric but identifiable like ./_earthaccess_download_YYYY-MM-DD_abcdef, but not strongly. I feel it's OK to pollute the user's current working directory with one directory per execution of earthaccess; we don't want to encourage this usage pattern anyway.

@chuckwondo
Copy link
Collaborator

Ah nice, even better :)

Do we like that choice (./data/YYYY-MM-DD-abcdef)? Should the name "earthaccess" be in any of the directory names to help users identify it?

The thing that rubs me the wrong way about that default is the data/ parent directory, it's not unlikely the user already has that directory and we'd be writing in to it alongside other things. I'd prefer something more esoteric but identifiable like ./_earthaccess_download_YYYY-MM-DD_abcdef, but not strongly. I feel it's OK to pollute the user's current working directory with one directory per execution of earthaccess; we don't want to encourage this usage pattern anyway.

I agree. I don't like the data parent directory either. I'd have to think a bit more about whether or not to put "earthaccess" in the name. The data itself isn't specific to earthaccess. I could have downloaded some other way. However, I don't have a particularly strong preference either way.

Regarding the suffix, I'd get rid of the uuid and simply use HH-MM-SS in place of uuid. That ensures that the directories have a sensible sort order.

I'm not a fan of defaulting to the current directory, as that makes it all too easy for large files to be accidentally committed to a repo. Perhaps defaulting to ~/.earthaccess/downloads/<datetime>? Of course, the downside to that is perhaps it isn't as obvious as the current dir.

@mfisher87
Copy link
Member Author

Regarding the suffix, I'd get rid of the uuid and simply use HH-MM-SS in place of uuid. That ensures that the directories have a sensible sort order.

💯

@andypbarrett
Copy link
Collaborator

I think ~/.earthaccess/downloads/<datetime> could get confusing for some users.

~/earthaccess_downloads/<datetime> or ~/earthaccess_downloads/<shortname>_<datetime> is an option. The directory is visible and the shortname and datetime would be unique, and interpretable.

@Rapsodia86
Copy link

I think ~/.earthaccess/downloads/<datetime> could get confusing for some users.

~/earthaccess_downloads/<datetime> or ~/earthaccess_downloads/<shortname>_<datetime> is an option. The directory is visible and the shortname and datetime would be unique, and interpretable.

Indeed! From a user point of view, I would like to see a collection/dataset name in a directory. So, if there are many downloads for that time, they won't end up in the same folder.

@jhkennedy
Copy link
Collaborator

jhkennedy commented Apr 30, 2024

My preference, which doesn't align here, would be to just plop stuff in the current working directory if I don't specify a directory. I generally dislike "random" directory structures showing up and find it confusing to find my data in different contexts.

Edit: And I would especially find a new directory showing up in my home directory confusing and wholly removed from the context I'm running the code in.

@chuckwondo
Copy link
Collaborator

My preference, which doesn't align here, would be to just plop stuff in the current working directory if I don't specify a directory. I generally dislike "random" directory structures showing up and find it confusing to find my data in different contexts.

Edit: And I would especially find a new directory showing up in my home directory confusing and wholly removed from the context I'm running the code in.

I can see this perspective as well. I could be convinced to simply default to the current directory. I would actually prefer no default at all, requiring the user to always specify the directory explicitly, but that would present a breaking change at this point. Since we are pre-1.0, that wouldn't be the worst thing to do, but not ideal either.

@jhkennedy
Copy link
Collaborator

jhkennedy commented Apr 30, 2024

To expand, the default only supports the case where you don't specify where the data should go. In that case, when I go looking for the data, I think the current working directory is the most obvious, some sub-directory is still findable, and an entirely different location on my system would be the least obvious.

@mfisher87
Copy link
Member Author

mfisher87 commented Apr 30, 2024

~/earthaccess_downloads/<datetime> or ~/earthaccess_downloads/<shortname>_<datetime> is an option. The directory is visible and the shortname and datetime would be unique, and interpretable.

I don't mind home directory in general, but I really don't like adding a non-dot directory to the root of home. It's a pet peeve :)

In that case, when I go looking for the data, I think the current working directory is the most obvious

Agreed. And I think this is the most platform-agnostic approach. What does adding a directory to a user's "home" looks like on Windows? I don't know :)

I would actually prefer no default at all, requiring the user to always specify the directory explicitly, but that would present a breaking change at this point.

Isn't this the current behavior? Although the type annotations disagree, you must pass in a local_path arg to call earthaccess.download.

@chuckwondo
Copy link
Collaborator

I would actually prefer no default at all, requiring the user to always specify the directory explicitly, but that would present a breaking change at this point.

Isn't this the current behavior? Although the type annotations disagree, you must pass in a local_path arg to call earthaccess.download.

Ha! Yes, that's true. Since there is no default parameter value, callers must supply a value, which is the whole point of this issue (facepalm).

In that case, my preference would be to simply correct the type annotation by removing Optional, but I suspect I'll be outvoted, so I'll semi-reluctantly agree to using the current directory as the default.

@jhkennedy
Copy link
Collaborator

I'm fine with it being a required parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🆕 New
Development

No branches or pull requests

5 participants