Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError when trying to use "special access" ESO archive query ("cas" url) #2994

Open
vandalt opened this issue Apr 23, 2024 · 2 comments
Open
Labels

Comments

@vandalt
Copy link

vandalt commented Apr 23, 2024

Hi!

When using the Eso class to access the regular "raw" archive,
everything works as expected (see first snippet below).

However, when using the "special access" archive by setting QUERY_INSTRUMENT_URL = "http://archive.eso.org/wdb/wdb/cas", I get an IndexError: list index out of range (see second snippet below, showing full error message as well). The error does not occur with astroquery<0.4.7.

I tried to investigate a bit more. The issue seems to be that the session is authenticated only at download time (here), but not when retrieving the file table. More specifically, the self._request() call in self.query() to get instrument_form seems to cause the issue (here). For the main archive, which requires no auth to get a table, I get a form webpage, as expected. However, for the special archive, I get the login page in the response. I tried setting the headers kwarg manually but that did not work.
I have a minimal example showing this in the third snippet.

Currently, my solution to access the special (CAS) archive is to use PyVO to directly do an authenticated TAP query (following the "programmatic access" examples from here).

The fact that the special access worked for version prior to 0.4.7 leads me think the issue is related to #2681, so I'll include @szampier, @almicol and @Pharisaeus and @bsipocz who were involved in the discussion for that PR.

I was wondering:

  • Is there a way to make web requests work with this authentication mechanism or is they best way to access the special archive via PyVO+TAP queries directly?
  • If PyVO+TAP is the only way, are there plans to migrate astroquery to using the TAP ESO interface?

Thank you!

Snippet 1: Working example with the main archive
# This example works and will give a table containing NIRPS calibrations
from astroquery.eso import Eso

eso = Eso()
eso.login(username="APERO")

criteria = {"instrument": "NIRPS", "night": "2024-03-26"}
table = eso.query_main(**criteria, cache=False)
print(table)
Snippet 2: Working example with the special archive (but working with `astroquery<0.4.7`
# This example does not work and give an error
from astroquery.eso import Eso

eso = Eso()
eso.login(username="APERO")

criteria = {"instrument": "NIRPS", "night": "2024-03-26"}
eso.QUERY_INSTRUMENT_URL = "http://archive.eso.org/wdb/wdb/cas"
table = eso.query_main(**criteria, cache=False)
print(table)

Which gives the following error

Traceback (most recent call last):
  File "/home/vandal/repos/astro/nirps-download/scratch/astroquery_not_working_cas.py", line 9, in <module>
    table = eso.query_main(**criteria, cache=False)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vandal/repos/astro/astroquery/astroquery/eso/core.py", line 455, in query_main
    return self._query(url, column_filters=column_filters, columns=columns,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vandal/repos/astro/astroquery/astroquery/eso/core.py", line 527, in _query
    instrument_response = self._activate_form(instrument_form,
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vandal/repos/astro/astroquery/astroquery/eso/core.py", line 98, in _activate_form
    form = root.find_all('form', id=form_id)[form_index]
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
IndexError: list index out of range
Snippet 3: Example requests to get instrument form

The first block gives an HTML form, the last two a login page.

# This example shows the astroquery function causing the issue (I think)
from astroquery.eso import Eso

# This will show an HTML form
eso_main = Eso()
eso_main.login(username="APERO")
url = eso_main.QUERY_INSTRUMENT_URL + "/eso_archive_main/form"
print(f"Trying with url {url}")
instrument_form = eso_main._request("GET", url, cache=False)
print(instrument_form.content)

# This will show a login page
# In the browser it shows the expected form, because I'm logged in
eso_cas = Eso()
eso_cas.login(username="APERO")
eso_cas.QUERY_INSTRUMENT_URL = "http://archive.eso.org/wdb/wdb/cas"
url = eso_cas.QUERY_INSTRUMENT_URL + "/eso_archive_main/form"
print(f"Trying with url {url}")
instrument_form = eso_cas._request("GET", url, cache=False)
print(instrument_form.content)

# This will also show a login page, despite attempt to use headers
eso_cas = Eso()
eso_cas.login(username="APERO")
eso_cas.QUERY_INSTRUMENT_URL = "http://archive.eso.org/wdb/wdb/cas"
url = eso_cas.QUERY_INSTRUMENT_URL + "/eso_archive_main/form"
print(f"Trying with url {url}")
instrument_form = eso_cas._request("GET", url, cache=False, headers=eso_cas._get_auth_header())
print(instrument_form.content)

print(eso_cas._get_auth_header())
@Pharisaeus
Copy link

@vandalt my best guess is that the immediate issue comes from the fact that not all ESO APIs support JWT tokens as authentication method. While download of the files, calselector, datalink and TAP accept them, WDB query interface might not (maybe @almicol can confirm this).
The changes made by @szampier switched the code from parsing html and submitting forms to using a proper API, but this also meant using a proper programmatic authentication API and getting JWT tokens (instead of submitting a login form and extracting session cookies from response). This is also what you get from _get_auth_header(): Authorization: Bearer ey... header, which won't work if the server-side expects a cookie instead :(

An immediate hacky solution to your problem could be to visit the page in your browser while authenticated and extracting the session cookie (in most browsers it's something like F12 -> Data -> Cookies) and then including this cookie in the request (I'm not sure how you can set/pass those exactly in the astroquery code but there probably is some parameter for that). Another option could be to use Basic Auth header, but I'm not sure if WDB handle that or not.

@almicol
Copy link

almicol commented Apr 24, 2024

Hi Thomas,

The astroquery.eso module has never officially supported special access, though the trick of configuring the end point to /cas instead of /eso indeed worked on version < 0.4.7. To extend the functionality of the module, following the evolution of the archive, we have started revamping it, and decided to change first the data retrieval and authentication parts, now available in version 0.4.7, leaving for a second stage the upgrade of the query part.

During this transition phase, the /eso => /cas trick cannot work, because of the different authentication method. WDB does not support basic authentication.

As said, the query part will use TAP, though this activity has to still find a time slot. At the moment I am working to expose all the instrument-specific tables via TAP, a pre-requisite step; later a software engineer will upgrade the module.

In conclusion:

  • You are certainly doing the right thing when using TAP+pyvo for special access;
  • At the moment, TAP is the only programmatic way to achieve what you want.
  • Some time in a not too far future, the astroquery eso module will support special access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants