Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The cloud_hosted flag for granule queries doesn't work #565

Open
chuckwondo opened this issue May 10, 2024 · 2 comments
Open

The cloud_hosted flag for granule queries doesn't work #565

chuckwondo opened this issue May 10, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@chuckwondo
Copy link
Collaborator

As discovered in discussion of #563, using the cloud_hosted parameter for a granule query does not work.

This reproduces the problem:

import earthaccess

results = earthaccess.search_data(
    short_name="VIIRSJ1_L2_OC",
    version="R2022.0",
    cloud_hosted=True,
    temporal=("2024-02-27 00:00:00", "2024-02-27 23:59:00"),
    count=10,
    bounding_box=(-180, 0, 0, 90),
)

The specified collection is not cloud hosted, so the query should return an empty list of results, but instead returns a non-empty list of results.

Alternatively, instead of returning an empty list of results, we could raise an exception. If we take this route, we would need to decide whether to use a built-in type, such as ValueError or TypeError, or define a custom exception.

@chuckwondo chuckwondo added the bug Something isn't working label May 10, 2024
@chuckwondo
Copy link
Collaborator Author

Another option would be to eliminate the cloud_hosted parameter from granule queries, particularly given that it is not actually directly supported by the underlying CMR Search API. Only collection queries support it. Thus, this parameter requires us to make an implicit collection query under the covers, prior to submitting the granule search (if there is a collection with a cloud_hosted value matching the parameter value).

By eliminating the parameter, it is up to the user to either know whether or not the collection is cloud hosted, or to issue a separate collection query first to determine whether or not it is cloud hosted. Given that we would need to make such a collection query under the covers anyway, if we keep the cloud_hosted parameter for granule queries, there would be no difference in performance. In fact, by not implicitly performing the collection query, the user is able to avoid the extra query, if they already know whether or not the collection is cloud hosted. Further, being explicit over implicit is the 2nd principle of The Zen of Python, so it is worth considering.

@betolink
Copy link
Member

Thanks for framing this problem @chuckwondo, I'm inclined to retain the cloud_hosted parameter at the granule level in order to save our users the extra query. Likewise, there is no DOI parameter at the granule level and (anecdotally) this is one the most useful features in the search_data method according to users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants