-
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyvo queries failing with "file not VOTABLE" error #403
Comments
On Mon, Jan 23, 2023 at 07:59:47AM -0800, Bryan Hilbert wrote:
Hello. The pyvo queries that JWQL performs have suddenly started
crashing with an error that "File does not appear to be VOTABLE".
That's temporary. As a matter of fact, MAST on its TAP endpoint
currently returns an HTML document to the effect that
The MAST site or service you are attempting to reach is currently
down for maintenance.
Please try again later.
Admittedly, this is a totally reasonable thing to do on the part of
MAST, and so I wonder if we should make it easier for people to
figure stuff like this themselves (I used a debugger and set a
breakpoint in the bowels of pyvo, which is not pretty).
The trouble to some extent is that we're trying to stream here, so
doing something sensible is hard: The offending input has probably
already been consumed by astropy by the time we get back control.
Hm. If someone has a good idea (in particular on the kind of API
that would let people figure this out more easily), I'd say let's go.
Otherwise, I'd say let's close this issue for now and wait for an
inspiration.
|
I cannot reproduce it any more so it's difficult to see what's available along with the error message after the initial votable parsing. We would need a toy example that returns this html to test whether the situation could be improved during errorn handling on the pyvo level, or in astropy, during parsing. So it may be possible that the right first step is to upstream this to astropy, even when it will land on the laps of the same people. E.g. it would be nice to fail on this with a more informative error, and parsing that out should be done in astropy. |
Hello! Yes, this issue occurred during scheduled downtime this morning for one MAST catalog, which briefly affected a wider array of our TAP and other services than planned. If folks think it might be useful to handle this more gracefully in the general case, we can provide an example of this general downtime announcement framework in action. That said, very specifically to TAP, MAST is actively moving services off the platform that had this issue today. The target architecture and its downtime announcements should match our other Python/astropy.mast behavior as we finish this up and port support for each catalog in TAP. (Whether that is any different, I don't actually know, but there are others who can chime in.) |
On Mon, Jan 23, 2023 at 09:49:19AM -0800, Brigitta Sipőcz wrote:
I cannot reproduce it any more so it's difficult to see what's
available along with the error message after the initial votable
Well, a reproducer at its core is as simple as
from astropy.io import votable
import io
src = io.BytesIO(b"<html><title>Hello World</title></html>")
votable.parse(src)
*if* we consider this to be a VOTable problem ("provide more
informative diagnostics when the VOTable read fails").
possible that the right first step is to upstream this to astropy,
even when it will land on the laps of the same people. E.g. it
would be nice to fail on this with a more informative error, and
parsing that out should be done in astropy.
The trouble with this is that "more informative" is really hard. You
can't possibly output the whole document (which may be really large),
and at least astropy (that doesn't have the slightest idea of what it
might be fed) can't, I think, make even educated guesses how to
extract something useful as an indication of the sort of content it
got.
Hence, I think if we upstream the bug, we'll first have to develop
some idea what we'd like to have (and I don't know where to start).
At our end, on the other hand, it *would* be a lot simpler. I think
an entirely sane behaviour would be to have, in some code wrapping
requests in the purview of DALI: "Service trouble: document at <url>
should be a VOTable but is not", leaving it to the users to paste
the url into their browsers and read the operator messages (btw,
making such a proposal non-security-critical is another argument for
having javascript off by default web browsers).
The trouble with this plan that I'd otherwise consider the way to go:
<url>. Many of our requests are POST requests, and some may have
extra state encoded (in particular auth).
The next best idea I have would be to make sure we can rewind the
first kilobyte or so of the payload (I'm sure there are off-the-shelf
solutions for that with requests). Then, if we get a parser error,
we inspect that first kilobyte and see if it looks like HTML. If it
does, we heuristically pull out the content of the first few body
elements and emit perhaps the first 200 chars of that (or so).
It sounds like an ugly hack, yes, but I think it'd greatly improve
user experience in at least 80% of the cases resembling this one.
|
Does the service return a 200 response status? Shouldn't that be a 503? That (and a simple text body) should make |
I suppose it would be nice to see more non 200 status codes, but in my experience the behaviours are all over the place, many servers seem to just focus on the browser experience and return a valid 200 with a message about the downtime embedded in the html. |
How about the |
On Tue, Jan 24, 2023 at 02:03:26PM -0800, Adrian wrote:
How about the `Content-Type` header? Not sure if `PyVO` checks that
before attempting to parse the VOTable but it should.
Good point. Also, I would have zero qualms about breaking services
that get that one wrong.
Note that we have to accept both application/x-votable+xml and
text/xml as VOTable media types. By the protocol, we *could* force a
particular media type using FORMAT (as the standard requires passing
that through); but I suspect that would break a substantial number of
services, and IMHO that regulation has been hare^Wnot terribly well
justified in the first place.
So: absolutely +1 on only trying to parse VOTables with one of the
two media types. For other responses, I'd say we ought to simply
print a snippet of the payload, perhaps the first 200 bytes or so,
with some sort of configuration variable letting users get more.
|
Hello. The pyvo queries that JWQL performs have suddenly started crashing with an error that "File does not appear to be VOTABLE". Here is our query:
The text was updated successfully, but these errors were encountered: