The `cloud_hosted` flag for granule queries doesn't work #565

chuckwondo · 2024-05-10T23:05:11Z

As discovered in discussion of #563, using the cloud_hosted parameter for a granule query does not work.

This reproduces the problem:

import earthaccess

results = earthaccess.search_data(
    short_name="VIIRSJ1_L2_OC",
    version="R2022.0",
    cloud_hosted=True,
    temporal=("2024-02-27 00:00:00", "2024-02-27 23:59:00"),
    count=10,
    bounding_box=(-180, 0, 0, 90),
)

The specified collection is not cloud hosted, so the query should return an empty list of results, but instead returns a non-empty list of results.

Alternatively, instead of returning an empty list of results, we could raise an exception. If we take this route, we would need to decide whether to use a built-in type, such as ValueError or TypeError, or define a custom exception.

The text was updated successfully, but these errors were encountered:

chuckwondo · 2024-05-10T23:17:37Z

Another option would be to eliminate the cloud_hosted parameter from granule queries, particularly given that it is not actually directly supported by the underlying CMR Search API. Only collection queries support it. Thus, this parameter requires us to make an implicit collection query under the covers, prior to submitting the granule search (if there is a collection with a cloud_hosted value matching the parameter value).

By eliminating the parameter, it is up to the user to either know whether or not the collection is cloud hosted, or to issue a separate collection query first to determine whether or not it is cloud hosted. Given that we would need to make such a collection query under the covers anyway, if we keep the cloud_hosted parameter for granule queries, there would be no difference in performance. In fact, by not implicitly performing the collection query, the user is able to avoid the extra query, if they already know whether or not the collection is cloud hosted. Further, being explicit over implicit is the 2nd principle of The Zen of Python, so it is worth considering.

betolink · 2024-05-21T03:36:38Z

Thanks for framing this problem @chuckwondo, I'm inclined to retain the cloud_hosted parameter at the granule level in order to save our users the extra query. Likewise, there is no DOI parameter at the granule level and (anecdotally) this is one the most useful features in the search_data method according to users.

chuckwondo added the bug Something isn't working label May 10, 2024

mfisher87 mentioned this issue May 21, 2024

Problem Accessing NASA Ocean Color Data with Earth Access tools #563

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The `cloud_hosted` flag for granule queries doesn't work #565

The `cloud_hosted` flag for granule queries doesn't work #565

chuckwondo commented May 10, 2024

chuckwondo commented May 10, 2024

betolink commented May 21, 2024

The cloud_hosted flag for granule queries doesn't work #565

The cloud_hosted flag for granule queries doesn't work #565

Comments

chuckwondo commented May 10, 2024

chuckwondo commented May 10, 2024

betolink commented May 21, 2024

The `cloud_hosted` flag for granule queries doesn't work #565

The `cloud_hosted` flag for granule queries doesn't work #565