Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sane behaviour for servicetype="image" #429

Open
msdemlei opened this issue Feb 28, 2023 · 15 comments
Open

Sane behaviour for servicetype="image" #429

msdemlei opened this issue Feb 28, 2023 · 15 comments

Comments

@msdemlei
Copy link
Contributor

In our Registry interface, we have had an "image" service type for a long time. It has so far been a synonym for sia1. Now that our Registry discovery supports sia2, too, the question is what to do with the image service type.

Here's a few options I have been pondering:

(1) Leave things as they are. Not pretty, because why should sia2 not be image?

(2) Deprecate servicetype="image" and then one day drop it. Well, we can always do that when we find out we can't come up with useful semantics.

(3) Make image the union of sia and sia2. I think that's about the worst we can do, because several services offer both sia and sia2, and these will then be queried twice (though we can't avoid duplicate results in general; in DaCHS services, for instance, there's just one global SIA2 service but many per-data collection SIA services, and it's hard to work out which SIA results you might also get through SIA2).

(4) Make image return sia services where available and sia2 otherwise. Since I personally don't like sia2 in its current state, I'd like that a lot. But we'd have quite a bit of explaining to do if we went for that.

(5) Make image return sia2 services where available and sia otherwise. I think that is about the least surprising thing we can do.

For (3) through (5), however, we have the additional problem of what get_service() would return. For reference, SIAService's search has the signature

(pos[, size, format, intersect, verbosity]) -> SIAResults

whereas SIA2Service's has

(pos=None, band=None, time=None, pol=None, field_of_view=None, spatial_resolution=None, spectral_resolving_power=None, exptime=None, timeres=None, publisher_did=None, facility=None, collection=None, instrument=None, data_type=None, calib_level=None, target_name=None, res_format=None, maxrec=None, session=None, **kwargs) -> SIA2Results

– where pos is, I think, a rather different beast in SIA2Service than in SIAService (it is in the protocol, I've not looked at the pyVO implementation).

This means that if we want to enable something like:

for svc in registry.search(servicetype="image"):
  matches = svc.search(???)
  for row in matches:
    # do something interesting with row

I think we won't get around a compatibility shim. What could this be? We could implement SIA2's extra conditions as good as we can locally on top of SIAResults (with suffiently rich service responses, that would be reasonable for band, time, and serveral of the others; btw: you don't have any guarantees that the constraints will be honoured in SIA2 either). But, really, someone else would have to contribute code for that. Volunteers?

Me, I think I'd do a GenericImageService with a search that just has pos and size, which would be translated into a CIRCLE query against SIA2 services. The return rows, on the other hand, would be pretty much SIA2Results, which are basically obscore rows.

So, what does everyone think? Do you perhaps have better ideas? I predict our future selves will be grateful if we think about this a bit. We'll always have similar problems when we have two major versions of a protocol coexisting, and this will probably happen quite a few times in the VO's future.

Should we have a side meeting on this in Bologna?

@bsipocz
Copy link
Member

bsipocz commented Mar 2, 2023

I'm very much leaning towards option 3) and leave it to the user to decide which one to use. Then the second choice is 5).

But I fully agree that it's something to be raised for input from the archives, especially from those who decide to serve both versions of the protocol. I would expect while some may prefer SIA v1, others very much prefer SIA v2.

@bsipocz
Copy link
Member

bsipocz commented Mar 2, 2023

Btw, this discussion is probably not limited to python/pyvo, but also I wonder whether it should be the same for e.g. the Gavo webform of the registry when one selects "Image Access"? Which options does it do currently?

@msdemlei
Copy link
Contributor Author

msdemlei commented Mar 7, 2023 via email

@bsipocz
Copy link
Member

bsipocz commented Mar 7, 2023

If the pyVO release planning allows this: should we take this to the
apps session in Bologna and hope for coffee break discussions to
figure out the right way to deal with this?

Yes, I think that's reasonable to bring up as a discussion item.

Also, I don't think this should necessarily hold up v1.5, currently, I consider #428 and #386 blockers for the release, but nothing else.

@msdemlei
Copy link
Contributor Author

msdemlei commented Mar 8, 2023 via email

@andamian
Copy link
Contributor

andamian commented Mar 8, 2023

Yes, we can probably find a spot in an Apps session for this discussion.

Without being very familiar with SIA1, I've always considered SIA2 as a natural evolution of the standard that will eventually supersede version 1. In that case, (5) made the most sense. However, recent SIA2 refactoring and the comments around it make them look like competing versions of the same standard. Is that correct? If so, (3) is the best option.

@msdemlei
Copy link
Contributor Author

msdemlei commented Mar 9, 2023 via email

@msdemlei
Copy link
Contributor Author

I've given a talk on that at the Bologna Interop (https://wiki.ivoa.net/internal/IVOA/InterOpMay2023Apps/sia-notes.pdf), and my conclusion from having prepared the talk and the reactions at the conference is: servicetype="image" was a mistake because it suggests an abstraction far too leaky to be useful (cf. https://en.wikipedia.org/wiki/Leaky_abstraction).

I suppose the intention was (or would now be) to enable an all-VO image search in that way. But such a thing is a lot more complicated that just looking for a certain kind of service. For one, you'd still be missing ObsTAP, and I've not even started with de-duplication, unifying interfaces, and so on.

Hence, here is what I propose:

(1) both servicetype="image" and servicetype="spectrum" won't change their current behaviour (i.e., they'll return sia1 or ssap services, respectively), but they will result in a deprecation warning saying, for now, that people should use sia or ssap instead. That's for the next release.

(2) We create a new module for doing global dataset discovery that hides the various details of protocols and all that from the user. That's definitely material for the next release, and I can't promise I'll invest terribly much time into that, but I'm not convinced that's what we need if we want user-comprehensible global dataset discovery. I'd say more about it in a bug I'd open when I close this. If and when the new functions are available, the deprecation warnings would point to them.

What does everyone think?

@bsipocz
Copy link
Member

bsipocz commented May 19, 2023

I'm not super happy about this, there was also a reasonably widespread notion that end-users shouldn't know much about vo, e.g. know sia vs sia2. With this proposal, we actually force them to know the difference and whether a dataset is offered via one or the other (e.g. at Irsa we have a mixture of the two and we really should not expect the users to know which datasets served by either).

@msdemlei
Copy link
Contributor Author

msdemlei commented May 20, 2023 via email

@andamian
Copy link
Contributor

andamian commented Jun 1, 2023

I agree that servicetype="image" -> SIA1 is incorrect but I don't know what the appropriate solution is.
global_find_images might be a bit too ambitious. There could be tens of services, do we search all of them even if they will return duplicate data?

For a data centric functionality, the workflow should probably be:

  1. Search for all image services and return a list of services
  2. (optional) User selects a subset of services. This can be based on the data provider or service type.
  3. Search for data in the service subset and return a list
  4. (optional) User filters the data list in a subset
    5.. Download data in the subset (with potential SODA operations: cutouts, packaging etc)

servicetype can achieve this with the union option. It will then be up to the user to pick services from the result and use them accordingly and this is the part that a higher level class could be useful. So as a fix, I suggest to make it a union not only of SIAs but all the services that return that data type. That can be deprecated in the future when/if a viable alternative becomes available.

Related to this, I've noticed that the CADC SIA service has 2 access URLs (one for each version) but the registry returns only one of them. TOPCAT shows both. Is that a bug or me not knowing how to use the registry search?

@msdemlei
Copy link
Contributor Author

msdemlei commented Jun 1, 2023 via email

@trjaffe
Copy link
Contributor

trjaffe commented Jun 13, 2023

FWIT, I agree with much of what Brigitta and Adrian said. I don't disagree with Markus' aims, but I don't like the current PR by itself. If we had an alternative workflow for users to find all-VO images, that would be different, but we don't yet. I would go for option 5 or 3 (duplication will always be with us) along with documenting what that means. I think astronomers can deal with the resulting duplication. If at some point, we have a better way, we can revisit this, but we do not as yet. This PR as implemented it seems to me will only add confusion.

@msdemlei
Copy link
Contributor Author

msdemlei commented Jun 14, 2023 via email

@bsipocz
Copy link
Member

bsipocz commented Aug 4, 2023

Given we have a long discussion here, I move my comment from #449 to here:

I would not like to go ahead with this deprecation, until we have a good/convenient alternative. I totally see the problems, but OTOH, the users should not really need to know about the versions, so doing an 'image regsearch should really just return both SIAv1 and v2. Then they can be used individually (assuming the search() functionality is fixed for sia2).

E.g. I strongly feel instead of this, one should be able to do one image search, that returns the ~90 services.

import pyvo
image_services_v1 = pyvo.regsearch(servicetype='sia')
image_services_v2 = pyvo.regsearch(servicetype='sia2')
v2_irsa = [s for s in image_services_v2 if 'irsa' in s.ivoid]
v1_irsa = [s for s in image_services_v1 if 'irsa' in s.ivoid]

The user should do a lot of digging after that anyway to see e.g. which "Spitzer" they want, but otherwise all the entries look so similar, that they should be in one result.

In [27]: v1_irsa[0]
Out[27]: ('ivo://irsa.ipac/2mass/images/asky-at', 'vs:catalogservice', '2MASS ASKY AT', '2MASS All-Sky Atlas Image Service', 'research', 'This service provides access to and information about the 2MASS All-Sky Atlas Images. Atlas Images delivered by this service are in FITS format and contain full WCS information in their headers. Additionally, the image headers contain photometric zero point information. 2MASS Atlas Images are suitable for quantitative photometric measurements.', 'https://irsa.ipac.caltech.edu/applications/2MASS/IM', '', 'archive', '', '', np.float32(nan), 'infrared', ['https://irsa.ipac.caltech.edu/cgi-bin/2MASS/IM/nph-im_sia?type=at&ds=asky&'], ['ivo://ivoa.net/std/sia'], ['vs:paramhttp'], ['std'])

In [28]: v1_irsa[1]
Out[28]: ('ivo://irsa.ipac/2mass/images/asky-ql', 'vs:catalogservice', '2MASS QL', '2MASS All-Sky Quicklook Image Service', 'research', 'This service provides access to and information about the 2MASS All-Sky Quicklook Images. The Quicklook Images delivered by this service are restored from lossy-compressed files in FITS format with full WCS information contained in the image headers. These images are suitable for position measurements, finding charts and visual inspection of the near-infrared sky.', 'https://irsa.ipac.caltech.edu/applications/2MASS/IM', '', 'archive', '', '', np.float32(nan), 'infrared', ['https://irsa.ipac.caltech.edu/cgi-bin/2MASS/IM/nph-im_sia?type=ql&ds=asky&'], ['ivo://ivoa.net/std/sia'], ['vs:paramhttp'], ['std'])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants