Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify IOC/COOPS API #125

Draft
wants to merge 23 commits into
base: master
Choose a base branch
from
Draft

Unify IOC/COOPS API #125

wants to merge 23 commits into from

Conversation

pmav99
Copy link
Member

@pmav99 pmav99 commented Jan 26, 2024

Addresses #99

@pmav99 pmav99 requested review from brey and tomsail January 26, 2024 17:51
@pmav99 pmav99 marked this pull request as draft January 26, 2024 17:52
@pmav99
Copy link
Member Author

pmav99 commented Jan 26, 2024

@SorooshMani-NOAA I pushed on the dev branch. Have a look.
I added the code on ioc.py but implemented the tests on a different file so that it is easier to run them.
Feel free to change anything. Don't worry too much about commit messages, we will rebase before we merge.

@brey & @tomsail if you find the time to checkout the dev branch and try it out, I would appreciate any feedback.

Copy link

codecov bot commented Jan 29, 2024

Codecov Report

Attention: Patch coverage is 96.31902% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 90.78%. Comparing base (13ba6be) to head (6a598b5).
Report is 10 commits behind head on master.

❗ Current head 6a598b5 differs from pull request most recent head e6a8ff8. Consider uploading reports for the commit e6a8ff8 to get more accurate results

Files Patch % Lines
searvey/utils.py 37.50% 5 Missing ⚠️
searvey/ioc.py 99.31% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #125      +/-   ##
==========================================
+ Coverage   90.18%   90.78%   +0.60%     
==========================================
  Files          13       13              
  Lines        1049     1205     +156     
  Branches      155      140      -15     
==========================================
+ Hits          946     1094     +148     
- Misses         56       64       +8     
  Partials       47       47              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

searvey/ioc.py Outdated Show resolved Hide resolved
@SorooshMani-NOAA
Copy link
Contributor

COOPS web API uses redirects, so we need to make sure we always follow redirects for COOPS:
https://www.python-httpx.org/compatibility/

@SorooshMani-NOAA
Copy link
Contributor

@pmav99 do you have any suggestions for debugging issues in the multithread/process calls by breakpoint?

@pmav99
Copy link
Member Author

pmav99 commented Jan 31, 2024

From the top of my head, no, not really. Do the functions work when you run them directly?

@SorooshMani-NOAA
Copy link
Contributor

I wanted to attach the debugger and step through the code when testing different inputs. For now I got away with print when I see issues, but I wanted to know if there's any better way of doing it.

@pmav99
Copy link
Member Author

pmav99 commented Feb 1, 2024

The problem with using breakpoint() or pdb.set_trace() is that the breakpoint will get executed as many times as you call the function. I doubt this is what you want.

But, TBH, this question feels a bit strange. Just write and test the functions that retrieve and parse the data without any concurrency. As soon as the retrieve_data and the parse_data functions are correct, you can plug in multifutures (or whatever) to add concurrency and make the whole thing faster. But trying to integrate concurrency from the get go is probably overcomplicating things.

That being said, what can help with debugging multiprocessing/multithreading code is using the logging module, which if configured correctly, can print information about the current thread/process. For example:

import logging
import time
import multifutures as mf

logging.basicConfig(
    level=10,
    style="{",
    format="{levelname:8s}; {process:6d}; {threadName:8s}; {asctime:s}; {name:<15s} {lineno:4d}; {message:s}",
    force=True,
)
logger = logging.getLogger(__name__)

def foo(a):
    time.sleep(0.3)
    logger.info("The value of a=%d", a)

logger.info("Start Multiprocessing")
results = mf.multiprocess(foo, [{"a": i} for i in range(5)])
logger.info("Start Multithreading")
results = mf.multithread(foo, [{"a": i} for i in range(5)])
logger.info("Done")

searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
Copy link
Member Author

@pmav99 pmav99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

If there are going to be so many functions that are shared between COOPS and IOC we should extract them to a separate module and import them from there.

searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
@SorooshMani-NOAA
Copy link
Contributor

... we should extract them ...

I agree, I tried to stay as close to the updated IOC code as possible, that's why I basically copy pasted everything and then started modifying. Some of these can be easily put in utils to be used by all the datasource APIs, but also I don't want to start abstracting too soon.

@pmav99
Copy link
Member Author

pmav99 commented Feb 5, 2024

WRT to the names mapping, I haven't tried this but I was wondering whether it is actually simpler to do something like this:

  1. Run the code we have in order to generate the final dictionaries
  2. Hardcode the final dictionaries in the file.
    Any time a product gets added/removed we add/remove a line from the respective dictionary.

Yes, the code will be longer and repetitive, but we have no code generation, no .update(), no ugly comprehensions, no ** etc. Just declarative code.

@SorooshMani-NOAA
Copy link
Contributor

@pmav99 it's a good idea, I'll do that!

@SorooshMani-NOAA
Copy link
Contributor

@pmav99 I'm trying to decide what to do with defaults for datetimes. The issue is for some products:

  • I should include dates that include more than one month before current time to be able to fetch, e.g. high_low, hourly_height, and monthly_mean.
  • Or I should not include only dates more than 1 month prior to current dates, e.g. one_minute_water_level.

To resolve the first issue I can just create coops specific _resolve_start_date and specify the default to be 60 days. For example as of today (February 6th 2024) for monthly_mean I should specify 2023/12/31 to 2024/02/06 so that I get data. If instead I specify 2024/01/01 it fails!

But specifying 60 days cause problem for one_minute_water_level. The max interval for this product is 4 days, but there's no data before less than 1 month from now (at least for the station I'm testing). So if I specify 2024/01/09 to 2024/02/06, it fails because the chunk from 9th to 13th doesn't have data.

I'm not sure what's best. Should I just leave it to the user and say the default might not work for some products? Or should we add more dictionaries for info about what is the default start date range (relative to now) for each product? I'm not even sure if these are true for all stations! Should I first consult with COOPS?

If it's the latter, I think it makes sense to for now go with "default might not work" and later amend.

Actually another problem is whether I have to raise error if there's no data or just warn the user? Coops gives me an error message when there's no data.

searvey/coops.py Outdated
if "error" in result.result:
msg = json.loads(result.result)["error"]["message"]
logger.error("Encountered an error response!")
raise ValueError(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmav99 should I raise or ignore? What do you think?

Copy link
Member Author

@pmav99 pmav99 Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ignore is the correct approach. When dealing with historical data it's not that uncommon to have big gaps. E.g. a station might have been offline for a few months (broken instrument or whatever). This does not mean that the final dataframe will be empty. other requests/urls might return data. Even if the final dataframe is empty, it is not difficult for the user deal with it.

I guess we could add a raise_if_empty argument to fetch_*() but I don't think it offers that much value.

@SorooshMani-NOAA
Copy link
Contributor

@pmav99 except for the two questions above, I'm done with COOPS API. I know that the metadata API needs some adjustment, but I'm assuming we want to keep that separate from the data API update, right?

@pmav99
Copy link
Member Author

pmav99 commented Feb 7, 2024

WRT the default dates. I have to admit that I haven't really looked into this, but couldn't we have a mapping with the default durations? If the data are not too big we could even return the maximum interval by default. Am I missing something?

@SorooshMani-NOAA
Copy link
Contributor

I agree that we might be able to cover most of cases with dictionaries, but I don't want to add too many dicts, like one for how long should I go back before I start having data (e.g. in case of mean, hi/lo, etc.), or how long can I go before there's no data (old data removed (e.g. one minute heights), and so on. I just don't want to go around adding a dict for each of these limitations, especially for these ones that don't have anything in documentation and I just found by trial and error! For now I think I'll just go with let the user figure out ... let's address this later. This PR is already a lot of changes and dictionaries!

@SorooshMani-NOAA SorooshMani-NOAA linked an issue Feb 14, 2024 that may be closed by this pull request
@SorooshMani-NOAA
Copy link
Contributor

@pmav99 I'm done from my side. Please let me know if there are any issues I need to fix, otherwise feel free to merge whenever you can. I'm not sure why tests are failing, please let me know if I need to do anything about it.

@pmav99
Copy link
Member Author

pmav99 commented May 15, 2024

@SorooshMani-NOAA I rebased on master and pushed. You will see that a bunch of COOPS tests are broken. Please fix them. Once this is done please squash the COOPS commits (rebase and push). Keep as many commits as you see fit. Once this is done we need to review the whole thing to see if it is ready for merging or not

@SorooshMani-NOAA
Copy link
Contributor

Thank you @pmav99 the rebase seems to have brought some of the changes for COOPS new API, but not all of it, I'll patch things up to prep for merge.

@SorooshMani-NOAA
Copy link
Contributor

I just noticed you created a _coops_api.py file. Did you have any reorg in mind or is it a results of automated conflict resolution?

@SorooshMani-NOAA
Copy link
Contributor

@pmav99 one question, the new IOC API also has station type info; for a given station ID, there might be different types. Should I get a unique set for stations API since we don't have that column?

STATIONS_COLUMNS: list[str] = [
"provider",
"provider_id",
"country",
"location",
"lon",
"lat",
"is_active",
"start_date",
"last_observation",
"geometry",
]

@SorooshMani-NOAA
Copy link
Contributor

@pmav99 all the tests go through, please go ahead with the merge as soon as you can so that we don't need to spend a lot of time rebasing, etc. again later. The precommit failure is weird, locally the notebook is cleaned up and precommit doesn't complain!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

COOPS Metadata for All Product Types Adapt COOPS to have an IOC-like API
2 participants