Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Cache refactoring #1634

Merged
merged 26 commits into from
Nov 10, 2022
Merged

ENH: Cache refactoring #1634

merged 26 commits into from
Nov 10, 2022

Conversation

ceb8
Copy link
Member

@ceb8 ceb8 commented Feb 5, 2020

This PR is a continuation of #442 so I am not sure it should be it's own separate thing (@bsipocz @keflavich @jwoillez).

Work included:

  • Rebased Cache timeout functionality #442 onto current master
  • Cache timeout functionality (see Cache timeout functionality #442)
  • Added configurable cache location (defaults to current cache location)
  • Added option to turn off caching all-together (addresses Add a config option to disable caching? #1511)
  • Added support for cache_timeout = None which prevents the cache from ever timing out
  • Added clear_cache function that removes all cache pickle files from the current cache location (addresses have timeout for cache #1531)
  • Added reset_cache_preferences that resets cache location, timout, and enabled/disabled status to astroquery defaults
  • Moved cache directory creation (if necessary) into to_cache function, guarding against users who manually blow away their cache while in the middle of a session (addresses Cache problem #1625)
  • _request function cache argument defaults to "None," which means that the default cache preference will be used, setting to either True or False overrides the the global preference.
  • Reordered the local_filepath or order in _request, so that savedir overrides the global cache location if set (I believe this is the intended behavior)

Questions:

  • I kept the updates to astroquery.cfg added by Cache timeout functionality #442, but did not further update, because I don't understand how this file is used. @bsipocz @keflavich what is the purpose of this file?
  • suspend_chache function: I don't understand how this function works or is being used. It sets the cache active status to False, however as far as I can tell we only cache when the to_cache function is explicitly called, so it isn't clear to me that it is actually doing anything. I assume its presence in the login function is to make sure that login credentials aren't accidentally cached which is very important, so I don't want to mess that up. I changed the private class member _cache_active into the public member use_cache to allow users to turn caching on/off, but it is unclear to me if I can just change self.obj._cache_active to self.obj.use_cache in the suspend_cache function or if something more complicated is needed. @bsipocz @keflavich if you can clarify this, I will update suspend_cache properly.

Outstanding work:

  • Documentation on how to use the new cache functionality
  • Test cases for the new cache functionality

@pep8speaks
Copy link

pep8speaks commented Feb 5, 2020

Hello @ceb8! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-09-12 15:47:26 UTC

@codecov
Copy link

codecov bot commented Feb 5, 2020

Codecov Report

Merging #1634 (016f6b4) into main (f74e7c0) will increase coverage by 0.10%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##             main    #1634      +/-   ##
==========================================
+ Coverage   63.97%   64.08%   +0.10%     
==========================================
  Files         132      131       -1     
  Lines       16984    16927      -57     
==========================================
- Hits        10866    10847      -19     
+ Misses       6118     6080      -38     
Impacted Files Coverage Δ
astroquery/query.py 62.40% <93.75%> (+4.03%) ⬆️
astroquery/utils/commons.py 76.71% <0.00%> (-1.68%) ⬇️
astroquery/conftest.py 78.78% <0.00%> (-1.22%) ⬇️
astroquery/alfalfa/core.py 95.31% <0.00%> (-0.69%) ⬇️
astroquery/utils/tap/xmlparser/jobSaxParser.py 96.87% <0.00%> (-0.04%) ⬇️
astroquery/noirlab/core.py
astroquery/ipac/irsa/irsa_dust/core.py 81.62% <0.00%> (+0.04%) ⬆️
astroquery/vizier/core.py 78.59% <0.00%> (+5.13%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jwoillez
Copy link
Member

@ceb8 Great to see you picking up this topic!

I should probably have gotten back to you sooner on your suspend_cache question. This is a context manager that let you cleanly disable the cache for some blocks of operations. See an example there:

with suspend_cache(self): # Never cache staging operations

My understanding of astroquery.cfg is that this is now an auto-generated file. You do not have to keep it. The default_cache_timeout = 86400 will appear by itself on first/new use, in the user's astroquery.cfg. But I may be wrong...

astroquery/__init__.py Outdated Show resolved Hide resolved
astroquery/query.py Outdated Show resolved Hide resolved
@bsipocz
Copy link
Member

bsipocz commented Apr 22, 2020

I had some old comments, but this now conflicts, so will come back to review once it's rebased.

@orionlee
Copy link
Contributor

orionlee commented Aug 25, 2020

@ceb will this PR support MAST case, where the MAST service uses a long-running polling mechanism such that the raw http response from MAST is really just a status code, not the actual responses?

I'm trying to ascertain to see if this PR makes #1578 obsolete.

@ceb8
Copy link
Member Author

ceb8 commented Aug 25, 2020

@orionlee No it doesn't make that PR obsolete. Because the MAST DBs update so regularly as new data comes in all the time, I had wanted this underlying work done before implementing true caching on MAST to avoid problems with outdated results being returned forever, but it doesn't solve any of the issues inherent in the long polling nature of the mast servers.

@orionlee
Copy link
Contributor

@ceb Thanks for the clarification.

@jwoillez
Copy link
Member

jwoillez commented Mar 3, 2021

What about looking into using one of the following?

@bsipocz bsipocz modified the milestones: v0.4.2, v0.4.3 May 15, 2021
@bsipocz
Copy link
Member

bsipocz commented Jun 29, 2021

I would change this to a draft PR rather than dragging through more milestone changes. We can change back from draft whenever someone will have the time of wrapping it up. Thanks!

@bsipocz bsipocz marked this pull request as draft June 29, 2021 23:41
@bsipocz bsipocz removed this from the v0.4.3 milestone Jun 29, 2021
@bsipocz bsipocz added this to the v0.4.7 milestone Mar 22, 2022
@ceb8 ceb8 marked this pull request as ready for review August 3, 2022 16:54
Copy link
Member

@eerovaher eerovaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having to call time.sleep in the expired cache test is very good. I still have two documentation suggestions, and an optional minor comment about a line in the tests. I would also recommend squashing out at least the PEP 8 commits.

astroquery/tests/test_cache.py Outdated Show resolved Hide resolved
astroquery/query.py Outdated Show resolved Hide resolved
astroquery/query.py Show resolved Hide resolved
@bsipocz
Copy link
Member

bsipocz commented Oct 28, 2022

I'll plan to have a final look at this on the train on Monday, thanks for all the patience @ceb8.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, one more round of review, and mostly nitpicks.

I haven't run all the remote tests, but the ones I run did pass, so I think we should go ahead.

My only serious concern is with the concept of picking. But given that we do that currently, I don't consider it as a blocker, but something for which we want to find a better solution.

astroquery/query.py Show resolved Hide resolved
/Users/username/.astropy/cache/astroquery/Simbad


To clear the cache:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worth also mentioning how to get rid of all the cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By deleting the directory you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the way I do it, but if there are any other ways, those can be listed, too.


Shown here are the cache properties, using Simbad as an example:

.. code-block:: python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to cover this with the remote tests. However, if this is intentionally skipped, please indicate it with doctest-skip instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think remote data is actually required for this, so I didn't give the remote data directive. I thought that unless explicitely skipped the tests would run under doctesting (once the file is added, right now the whole file is skipped at the top).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, OK, fair enough. The file has opted into testing on main already, so maybe this will just need a rebase to make sure these blocks pass.

@bsipocz
Copy link
Member

bsipocz commented Nov 10, 2022

OK, I would say let's merge this now. If there is anything left, small follow-ups are always welcome.

Also, @ceb8, could you please go through the cache-related issues and close the ones that are not relevant anymore after the merge? I see none are linked for auto-closure, but I don't think it's worth keeping this open for that reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

6 participants