Ensure more creative results are shown in our results #2157

kvnthomas98 · 2023-10-05T15:27:27Z

Currently Lookup Results may dominate the results.
If we have a creative query, we need to ensure that creative results don't get filtered out.

Suggestions proposed by @dkoslicki:
i) Manually place creative results on top.
ii) Interleave creative results between lookup results.

saramsey · 2023-10-24T17:38:34Z

Slotting this on the agenda for tomorrow's AHM. Good topic for group discussion.

saramsey · 2023-10-24T17:39:25Z

I am intrigued with the "interleave" idea. While xDTD is awesome, I have concerns about a modification such that creative mode results are always (and only) at the top; I can expound on that tomorrow at the meeting.

dkoslicki · 2023-10-25T18:09:09Z

Thoughts from the AHM:

Different possible approaches include:

(easy; but xDTD could be on later pages of the UI) Rank the lookups, take the xDTD results, take n from the former and m from the latter where n+m < 500 (or whatever the specified cutoff is)
(harder, but proper) Get rid of the noise/cruft in the lookup results. There is no way there are >500 drugs that treat a given disease. I suspect this is due to SemMedDB. A naive approach would be to impose "we never return more than N lookups where N is small (eg. 25). A nuanced approach would be to see what's causing the explosion of lookup results, and downrank them in the ranker.
(not ideal, but fast) Just interleave the results and check with the UI team if this will make the UI show the creative results

dkoslicki · 2023-10-25T18:15:20Z

Eg. of a bazillion lookups: https://arax.ncats.io/?r=174764
Any common(ish) disease will do

saramsey · 2023-10-30T17:48:49Z

@dkoslicki As a test (and since there was a TRAPI query for it in #2187) I ran "what drugs treat multiple sclerosis" through ARAX (the arax.ncats.io/beta endpoint) using knowledge_type="inferred". I got 500 results. The first 50 results look pretty reasonable, with a minority of experimental/investigational treatments in there (vitamin D, epigallocatechin, cannabidiol, estriol, ibudilast, melatonin, biotin, etc.). Below the first 50, we start to get some really broad categories like "Antibodies" or "Interferons" or "Vaccines", or "Vitamins" or "immunomodulators". We also start to get some puzzling results like "ethylene glycol" (which may reflect text-mining getting confused by text about PEGylation of some other therapeutic agent). Below the first 150 results, we do start to see increased frequency of crazy stuff like "caffeine", "fish oils", "ketamine", "nicotine", "tadalafil", and so forth.

I think there are four things driving such a large number of lookup results:

Insufficient canonicalization. I see a lot of essentially the same results that are repeated with slightly different names. Like "glatiramer" and "glatiramer acetate", that kind of thing.
We have general terms like "Vaccines", "Antibodies", "Cannabinoids", "Immunoglobulins", etc. that are cluttering up the results
We have a lot of drugs that are on the list because they are being intensively studied for efficacy in MS. Theoretically, if we were to filter to get only the drugs that are marketed (i.e., indicated) for MS (and I'm not suggesting we do that in practice), we'd see the result list length drop by probably 8X to 10X.
Drugs that are used to treat other comorbidities of MS, but that are really not MS (e.g., tadalafil or what have you).

I think our scores are, overall, a bit too high for the drugs that are not indicated for MS (e.g., the investigational treatments). For the drugs that are indicated for MS, the scores are fine.

Our scores are way too high for the overly general stuff like "Vaccines" and stuff like that. Ideally, those should be either filtered out or have their score reduced due to the concepts' generality. I know we've talked about this a lot, I guess I'm just echoing the feeling here that it would be good if we weren't seeing "antibodies" and "vaccines" and "vitamins" in the results.

saramsey · 2023-10-30T17:50:32Z

So in conclusion, I concur, there really aren't 500 different treatments for M.S. But there are probably at least 60-70 that are used to manage M.S. (remember it's a complex multi-faceted disease for which there is AFAIK no cure), plus another 100 to 150 being actively investigated.

dkoslicki · 2023-10-30T19:26:08Z

@saramsey do we have a KP or edge property that we can use explicitly for "indicated for"? IIRC, when we ask for treats edges, KP's don't distinguish between investigational and indicated for. Perhaps there's something in KG2 we could use to cross check?

saramsey · 2023-10-30T19:32:39Z

@dkoslicki I am not sure. It is a problem that the biolink:treats is being used for investigational/experimental therapies like vitamin D. I think the Biolink people and the Predicates WG people are working on "refactoring" the biolink:treats predicate to allow more precise statements for such cases.

In the meantime, I like the idea of trying to pull in that information from somewhere. I am not sure about where we could get it, though. I guess if someone were to go through all 500 results and label them as "indicated", "investigational", and "neither" (this would take an afternoon though!), we could try to find which sources are contributing to the "indicated" vs. "investigational". I suspect there will be a bias towards certain sources.

dkoslicki · 2023-10-30T19:46:09Z

https://www.fda.gov/drugs/drug-approvals-and-databases/drugsfda-data-files perhaps? Doesn't cover biologics and like though

amykglen · 2023-10-31T17:41:31Z

while I'm not aware of something like indicated_for edges that capture which drugs are FDA approved to treat which conditions (that seems very useful), we do have the ability to constrain queries on FDA approval status (#1599, which makes use of KG2 data (#1497))... it wouldn't let us filter down the result set to drugs approved specifically for MS, but maybe it would at least get rid of general terms and drugs not yet approved for anything?

dkoslicki · 2023-10-31T18:35:07Z

Ah, I wasn't aware of that. Should be a good first pass, so @kvnthomas98 please do make note of Amy's comment once you start working on this.

saramsey · 2023-11-01T18:18:28Z

Thank you @amykglen, good suggestion

kvnthomas98 assigned dkoslicki and kvnthomas98 Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure more creative results are shown in our results #2157

Ensure more creative results are shown in our results #2157

kvnthomas98 commented Oct 5, 2023

saramsey commented Oct 24, 2023

saramsey commented Oct 24, 2023 •

edited

dkoslicki commented Oct 25, 2023

dkoslicki commented Oct 25, 2023

saramsey commented Oct 30, 2023 •

edited

saramsey commented Oct 30, 2023 •

edited

dkoslicki commented Oct 30, 2023

saramsey commented Oct 30, 2023 •

edited

dkoslicki commented Oct 30, 2023

amykglen commented Oct 31, 2023

dkoslicki commented Oct 31, 2023

saramsey commented Nov 1, 2023

Ensure more creative results are shown in our results #2157

Ensure more creative results are shown in our results #2157

Comments

kvnthomas98 commented Oct 5, 2023

saramsey commented Oct 24, 2023

saramsey commented Oct 24, 2023 • edited

dkoslicki commented Oct 25, 2023

dkoslicki commented Oct 25, 2023

saramsey commented Oct 30, 2023 • edited

saramsey commented Oct 30, 2023 • edited

dkoslicki commented Oct 30, 2023

saramsey commented Oct 30, 2023 • edited

dkoslicki commented Oct 30, 2023

amykglen commented Oct 31, 2023

dkoslicki commented Oct 31, 2023

saramsey commented Nov 1, 2023

saramsey commented Oct 24, 2023 •

edited

saramsey commented Oct 30, 2023 •

edited

saramsey commented Oct 30, 2023 •

edited

saramsey commented Oct 30, 2023 •

edited