Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt COOPS to have an IOC-like API #99

Open
SorooshMani-NOAA opened this issue Aug 2, 2023 · 13 comments · May be fixed by #125
Open

Adapt COOPS to have an IOC-like API #99

SorooshMani-NOAA opened this issue Aug 2, 2023 · 13 comments · May be fixed by #125
Assignees

Comments

@SorooshMani-NOAA
Copy link
Contributor

We'd like to have an standardized API for all the station data. Also we'd like to make sure all the possible queries through searvey are successful assuming the server gives a correct response (e.g. vector data).

To this end as we discussed we'll start adding a parallel API for coops stations and explore what parts of the current API stays relevant to be kept.

stormevents relies on COOPS module, so it's a good test case for keeping compatibility to the old API as much as possible during this development

@SorooshMani-NOAA SorooshMani-NOAA self-assigned this Aug 2, 2023
@SorooshMani-NOAA
Copy link
Contributor Author

Partially addressed in #123

@SorooshMani-NOAA
Copy link
Contributor Author

With the new metadata API implemented in #123, one needs to use the COOPS_Query API to be able to fetch data. Using COOPS_Station or coops_product_* API won't work due to their internal reliance on the old metadata API.

From @AtiehAlipour-NOAA

After the changes were made to the 'searvey' package (#110), I updated the 'searvey' package and ran 'autoval'. However, I was getting the same error (unable to find ~130 stations) while using the 'searvey' package to download data. Here is the function I was using for downloading NOS observation data:

image

@SorooshMani-NOAA suggested using 'COOPS_Query' instead, which resolved the issue. 'autoval' now uses the updated 'searvey' to download all NOS stations. Here's the new function:

image

@SorooshMani-NOAA
Copy link
Contributor Author

@pmav99 as I started working on this, I realized that there are some limitations on the combination of products, data length and data interval in the web API. These are documented in: https://api.tidesandcurrents.noaa.gov/api/prod/ at different sections.

Sometimes these incompatibilities are just ignored by the web API and the default is returned (e.g. for water_level only 6min interval data exists and specifying interval=h or interval=1 is ignored). Some other times the combination results in error responses, like:

image

Do you think it's reasonable to account for all of this in searvey, or should we just return whatever the web API provides? e.g. only raise error if we see response error message and don't do anything if an incompatibility in the inputs are ignored by the web API?

@SorooshMani-NOAA
Copy link
Contributor Author

@pmav99 apart from the error issue above, what's your vision for the unified API? I mean IOC accepts enddate and period, but USGS and COOPS accept start and end dates instead. Do we want to keep all of them to use enddate and period or do we want to allow different params for different providers?

@pmav99
Copy link
Member

pmav99 commented Jan 23, 2024

@SorooshMani-NOAA go with start/end dates. For the IOC I was even thinking of something that would allow to have a different start/end date for each station. I don't think that different end dates is that useful, but the start_date can be useful when you are scraping data and a station goes down:

def my_func(
    ioc_codes: list[str],
    start_dates: list[str] | list[pd.Timestamp],
    end_dates: list[str] | list[pd.Timestamp],
    # ...

But that's just a thought. I am happy to hear your thoughts.

@pmav99
Copy link
Member

pmav99 commented Jan 23, 2024

If the API throws an error, I guess that we should also throw an Exception.
If the API ignores something, ideally the client code should throw a warning, but detecting when something is ignored can be tricky if there is no explicit documentation.

@SorooshMani-NOAA
Copy link
Contributor Author

As we discussed in the meeting, we're going to unify/standardize how searvey's IOC (using IOC API) and COOPS work, with better arguments (e.g. start_date instead of period, etc.), logic of multiprocessing/threading, etc.

I'll wait for the new branch for IOC to be created and then will work on COOPS updates based on that.

Other relevant highlights from the meeting for new API:

  • Drop support for time_zone parameter in COOPS, we always use UTC (time_zone=gmt in request). If user's input is tz-naive, it is assumed to be UTC, if it has non-UTC tz, it is converted to UTC before sending request to COOPS server. We'll drop support for product-interval combinations that don't support gmt as time_zone
  • Use dictionary of pd.DataFrame for multiple stations instead of xr.Dataset
  • Add support for current and current prediction queries (results in multi column DataFrame) in the new API
  • Use default MSL datum in the new COOPS API, but close change datum default to MSL for coops_product_within_region #124 and leave the old API as it is

@SorooshMani-NOAA
Copy link
Contributor Author

Related to #126

@SorooshMani-NOAA SorooshMani-NOAA linked a pull request Feb 14, 2024 that will close this issue
@SorooshMani-NOAA
Copy link
Contributor Author

I will need to work on better documentation and updating notebooks

@SorooshMani-NOAA
Copy link
Contributor Author

@pmav99 can you please share the code you were using to get multiple stations using the new API? I'd like to add it as an example in the coops notebook.

@pmav99
Copy link
Member

pmav99 commented Apr 10, 2024

station_ids = ["abas", "acaj", "acap2"]
start_dates = pd.DatetimeIndex(["2023"] * len(station_ids))
end_dates = pd.DatetimeIndex(["2024"] * len(station_ids))

ioc._fetch_ioc(
    station_ids=station_ids,
    start_dates=start_dates,
    end_dates=end_dates,
    rate_limit=None, 
    http_client=None, 
    multiprocessing_executor=None, 
    multithreading_executor=None
)

and it returns a dictionary of dataframes.

@SorooshMani-NOAA
Copy link
Contributor Author

Thanks, I remember we discussed this, but I don't recall what was the outcome:
This is a "private" API that we're calling, would it be OK to advertise it in the example notebook? Or should I just include an example with one station for now?

@pmav99
Copy link
Member

pmav99 commented Apr 10, 2024

I would just stick with one station for now. After all, you can get the same functionality with a for loop.

Let's discuss this on the meeting too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants