Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh cached metadata #95

Open
pmav99 opened this issue Jul 24, 2023 · 3 comments
Open

Refresh cached metadata #95

pmav99 opened this issue Jul 24, 2023 · 3 comments

Comments

@pmav99
Copy link
Member

pmav99 commented Jul 24, 2023

In IOC, COOPS and USGS we are caching the retrieved metadata. This is really useful for e.g. running the tests, but it can be problematic for long running processes (in the range of days/weeks/months). The first call will cache the metadata and, currently, there is no easy way to update the metadata.

I was thinking that we should add an extra argument in get_*_stations() functions similar to refresh_cache: bool = False. This way we will keep the existing behavior, and if someone needs to refresh the cache, they will be able to do so.

As far as the actual implementation goes, we would need something like this: https://stackoverflow.com/a/37654201/592289

pinging @brey @SorooshMani-NOAA

@brey
Copy link
Contributor

brey commented Jul 24, 2023

I like the idea. We need to establish a threshold for the refresh to kick in. Ideally, this should be internal and not visible to the user, although a warning/info comment might be required for transparency.

Maybe we need also to document how the users should achieve persistence in the usage of searvey if that is required.

@SorooshMani-NOAA
Copy link
Contributor

Having the ability to reset helps. Ideally this should be available as an automatic operation (e.g. per day/hour/etc.) for non-developer users and as manual ability to reset for others. We already know that calling cache_clear can be used for the manual part, but for automatic this is an interesting idea:
https://stackoverflow.com/questions/31771286/python-in-memory-cache-with-time-to-live

There's also this package:
https://cachetools.readthedocs.io/en/latest/
Although maybe let's think twice before adding more dependencies

@pmav99
Copy link
Member Author

pmav99 commented Jul 24, 2023

WRT to persisting searvey's metadata, we are using standard (geo)pandas, therefore I don't think we need to provide a specific API for this. Adding a note in the docs and/or example in the notebooks wouldn't necessarily be a bad idea though.

I didin't think of automatically invalidating the cache after some time, but I agree it is a good idea, and that SO answer seems to provide a rather elegant way of doing so without introducing any 3rd party dependencies. WRT to adding a runtime warning I am -1 to be honest. For sure we should document it but a warning each time you call a functions seems to be too much. Moreover 3 warning when you call searvey.get_stations() etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants