Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor WFAU image retrieval to reject deprecated images #2809

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

keflavich
Copy link
Contributor

This refactor also adds a new method to retrieve the very useful metadata tables rather than just scrape them for download URLs.

Solves #2808

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments, but overall I turned this to a draft and plan to come back once it's finished, CI passes and it's out of the draft status

image_width=1 * u.arcmin, image_height=None,
radius=None, database=None,
programme_id=None, get_query_payload=False):
def get_image_list(self, coordinates, *, radius=None, ignore_deprecated=True, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spell out kwargs, we try to get rid of this way of hiding them so we should not introduce more of these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, ok. That's a lot of duplicated text, which I don't love, but fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but if it's public API, one would need useful docstrings, too. dumping kwarg out there is not exactly useful. OTOH, if it's an option of not getting slightly overlapping public API then certainly I would be in favour of not duplicating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_image_list is more of an internal-pointing method. It is technically usable, and useful, to users, but get_image_table is more useful and I prefer to point people there. I'd actually lean toward making _get_image_list private before duplicating its docstring and keyword arguments. Let me know which you prefer

Comment on lines 444 to 446
self._penultimate_response = response
response = self._check_page(response.url, "row")
self._last_response = response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really necessary?

Also, try to avoid overwriting variables to help future debugging, the second response could be response_row, or whatever makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this was for debugging, I'll remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the _last_response addition has historically been very useful in debugging other tools, so I prefer to keep it, but I agree about the variable renaming

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, keeping _last_response is fine, I really just meant to spot _penultimate_response ;) and the rename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, I could drop that. But when else do you get to use penultimate in the real world?!

@bsipocz bsipocz added this to the v0.4.7 milestone Aug 14, 2023
@bsipocz
Copy link
Member

bsipocz commented Aug 14, 2023

Re "solves" - it's not a "magic" word, won't auto-close the issue. Please use fix, or close, or resolve instead :) https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword

@bsipocz bsipocz linked an issue Aug 14, 2023 that may be closed by this pull request
@keflavich
Copy link
Contributor Author

local tests passing:

pytest astroquery/ukidss -x --pdb --remote-data
astroquery/ukidss/tests/test_ukidss.py ........                                                                                                                                                          
astroquery/ukidss/tests/test_ukidss_remote.py 	........                                                                                                                                                   

@@ -479,6 +479,26 @@ def parse_imagequery_page(self, html_in, radius=None):
for row in html_in.split("\n")])
return ascii.read(html, format='html')

def extract_urls(self, html_in):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I commented this before in another module: I think methods like this should not be class methods, definitely not public ones, but instead private functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I agree - happy to make this, and get_image_list, both private

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can't make get_image_list private, at least not without a deprecation period.

Comment on lines 444 to 446
self._penultimate_response = response
response = self._check_page(response.url, "row")
self._last_response = response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, keeping _last_response is fine, I really just meant to spot _penultimate_response ;) and the rename.

@keflavich
Copy link
Contributor Author

The last failure I see here is in cadc, not wfau - is this real?

@bsipocz
Copy link
Member

bsipocz commented Aug 15, 2023

The last failure I see here is in cadc, not wfau - is this real?

Opened #2811 for it. It looks consistent in CI, but I cannot yet reproduce it locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WFAU module has no way of identifying deprecated WFCAM images
2 participants