New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARAX responding with 400 errors with TRAPI validation errors when valid Knowledge Graphs are included in the query #2106
Comments
Does it possibly work any better if you set auxiliary_graphs to {} in your original query? This was one of the TRAPI 1.40 / 1.4.2 problems I think... |
@edeutsch Thanks for the suggestion. In the first query (fill_e1_reduced-kg.txt), changing it to In the second query (fill_e1_no_arax_validation_error.txt), adding |
Hi @karafecho , I think the issue is just that source_record_urls and auxiliary_graphs can't be none in TRAPI 1.4.0. This is the TRAPI 1.4.0 vs. 1.4.2 problem that we discovered after release and decided it wasn't worth a whole new release. If you just change:
to:
and
to:
then it will work fine. Can you do that? Fixing the query will be a lot easier than fixing the code for the moment, and should allow you to proceed? Or is is that not feasible? This does not generate errors:
|
I'm not sure fixing the query will work. When I ran your query, it indeed passed the TRAPI validation test, but it did not return any results. [As an aside, when I sent the query to the ARS, I received a 598 timeout error, which shouldn't be the case, but that appears to be an ARS issue that @MarkDWilliams will need to explore.] @maximusunc explored the issue a bit. From what he reported, I think there are additional issues beyond the TRAPI validation error that are preventing the query from both passing TRAPI validation in ARAX and returning results. That said, Max and Abrar are re-engineering the Path A CQS query that drove this issue, and so, we may be able to independently resolve it. Will keep you posted ... |
I'll add one other issue that Max identified. Specifically, I think ARAX is considering Just thought I'd point that out ... |
Well, I was just resolving the validation errors. I don't know anything about the query itself or what is expected of it. I'm not sure I understand about the infores. I think we would go by how it's registered in SmartAPI. So if the SmartAPI name is different than what you specify, then it probably won't work. I think you need to specify the SmartAPI key? I'm not certain. |
Here's a stack trace that I get when trying to send a
This list of KPs doesn't include any of the automat services, but they are all valid KPs. Could this be looked into please? |
@amykglen would you be able to look into item 2 above? |
Sorry, we did just discover a bug where the Automats on dev weren't registered on SmartAPI correctly. Could you just try refreshing your KP list please? |
ah yes, I forgot about that. we reported in NCATSTranslator/Feedback#557 Refreshing should be automatic within 10 minutes. Is it resolved now? |
Nevermind, it was likely a SmartAPI registration issue. |
I'm seeing green on the ARAX page for the Automats, but I'm still getting the same error when sending a query to ARAX. |
okay, thanks for checking and reporting. @maximusunc can you remind us which endpoint you're testing against? @amykglen would you look into this?, specifically: |
We're hitting your dev instance: https://arax.ncats.io/beta/api/arax/v1.4/query |
yep, I'll look into it @edeutsch! @maximusunc - can you provide an example query that produces the issue you're seeing? that would be super helpful. (I've tried running some of the queries linked earlier in this issue, but none are producing the issue you're reporting) |
Sure: {
"workflow": [
{
"id": "fill",
"parameters": {
"allowlist": [
"infores:cohd",
"infores:automat-icees-kg"
],
"qedge_keys": [
"e0"
]
}
}
],
"message": {
"query_graph": {
"edges": {
"e0": {
"exclude": false,
"predicates": [
"biolink:correlated_with",
"biolink:associated_with_likelihood_of"
],
"subject": "n0",
"object": "n1"
}
},
"nodes": {
"n0": {
"ids": [
"MONDO:0009061"
],
"is_set": false
},
"n1": {
"categories": [
"biolink:MolecularEntity",
"biolink:EnvironmentalExposure"
],
"is_set": true
}
}
}
},
"log_level": "DEBUG"
} |
thanks @maximusunc! well, that query just completed successfully for me (https://arax.ncats.io/beta/?r=172033), but it appears that it might only have worked this time because it happened to trigger a cache refresh on our end:
I thought the ARAX background tasker was supposed to update the KP info cache on something like an hourly basis, so I'm not sure why it was so out of date (this is on |
Yes, it's possible there is a bug. The Background Tasker and the kp_info_cacher.py were substantially changed in the |
One thing to check is that the Background Tasker process is currently running, for |
yeah, apparently not:
Died on the 27th |
Is that about when we did the merge to master? |
I wonder if we should stop and restart |
restarting |
I'm looking through the elog file to look for clues. Will you save that first? |
Shoot, just restarted, sorry. |
My bad, Eric; I should have thought to save a copy of the elog file. |
Eric if you still have it open in |
Here's what I see in the elog:
The next ping from the BackgroundTasker should have been at 14:39:51, but it was not heard from again. |
Eric I note that the call to |
yes, possible. Although it really should have spewed a stack trace to the elog then? |
I still have the inode open in |
I mused about the possibility of BackgroundTasker dying before, but had not seen it until now. |
Good point. Yes, I would expect that. |
I can confirm later in the log that Expander noticed the problem and refreshed after 24 hr:
|
So maybe query_tracker.check_ongoing_queries() is hung? Can you tell if the child process is still running or gone? e.g. do you know the PID? |
I did look at the process table, but there were a bunch of child processes (i.e., running queries), so it was hard to determine if any of them was the background tasker child process. I think maybe we should patch the background tasker module to print its PID in status messages. |
At this point, if a remnant background tasker process was still running, it would (I assume) no longer show up as a "child process" in the process table, but just as a regular process. I think it is unlikely to still be hanging around (in a stuck state) given the SIGTERM handler code in there, but I guess it is possible. |
OK, let's take a look for an orphaned, stuck background tasker process. Here is the current list of python processes:
The only PIDs without a child process is 4133.
So process 4133 is listening on port 5000:
So that is the ARAX production server, which is still running the old thread code and wouldn't be expected to have a background tasker child process. Conclusion: I didn't find an orphan background tasker process, in the process table. |
Issue: When submitting a query to ARAX that includes a message with a knowledge graph, ARAX responds with a 400 error with the following data:
where the contents of {...} in the detail is the content of the knowledge graph.
However, when I run these queries through the Reasoner Validator (version 3.8.5) against TRAPI 1.4.2, the query passes validation.
Here's a sample query that will generate the error. The knowledge graph has been stripped down to a single edge and 2 nodes with minimal content.
fill_e1_reduced-kg.txt
If we strip out the contents of the knowledge graph
nodes
andedges
completely and also remove"auxiliary_graphs": null
(auxiliary_graphs
is nullable in the spec but ARAX also issues a warning about it being None), ARAX no longer returns TRAPI validation errors (different errors are returned, but these errors are expected).fill_e1_no_arax_validation_error.txt
This is not a priority issue for September.
Context: The Translator Clinical Data Committee are working on developing some workflows with the Workflow Runner. In our "Path A" workflow, we want to issue a
fill
operation on the first edge with a specificallowlist
for clinical KPs followed by a secondfill
operation for the remaining edges using any KP.The text was updated successfully, but these errors were encountered: