Epoprostenol used to treat rats #2243

edeutsch · 2024-03-01T19:36:47Z

I was assigned this issue by TAQA:
NCATSTranslator/Feedback#707

Apparently xDTD was trained with KG2-SemMedDB that asserts that Epoprostenol is used to treat rats. And there are lots of papers describing treatment of rats with Epoprostenol. But apparently this is not an appreciated answer.

It is unclear to me whether we just want to remove such SemMedDB edges in KG2

Or whether the xDTD training data can be refined to exclude Drug-treats-X edges where X is a species.

Or whether this problem goes away on its own with the upcoming KG2 "treats" refactor. (where I assume we should make an effort to ensure that ideas like:
Drug X was used to attempt to treat disease Y in species Z
are NOT excoded as:
Drug X treats species Z

Anyone have ideas on how to handle the TAQA issue?

edeutsch · 2024-03-01T23:35:13Z

Bill stated it more elegantly than I did. Do we/can we employ domain and range constraints to avoid this kind of thing:
NCATSTranslator/Feedback#707 (comment)

amykglen · 2024-03-04T16:56:31Z

The KG2 API does actually filter out edges that violate such domain/range specifications, but they're still in the underlying KG2c graph, which xDTD is trained on (I think). Maybe those edges should be excluded from the graph used for training? They're easily identifiable by the domain_range_exclusion property. (There are 3.8 million such edges in KG2c - about 8% of the total edges.)

saramsey · 2024-03-11T20:54:09Z

Do we need a fix for this in the Lobster release? Hoping the answer is no, and that we can instead aim to fix this in the Octopus release?

saramsey · 2024-03-11T20:56:49Z

I'm not sure that I am informed enough to have an opinion about whether or not we should include edges with domain_range_exclusion set to True (i.e., excluded edges) in the graph used for training xDTD. But it seems like we should (somehow) ensure that ARAX isn't returning results for which the key edge basis is an excluded edge. I'm fine with the idea of adding a filter for this, if that is what people feel is best. @dkoslicki @chunyuma @amykglen what do you think?

chunyuma · 2024-03-12T15:01:36Z

Hi @edeutsch and @saramsey, I think both solutions (1. use filtered KG to train xDTD; 2. add a filter to the xDTD outputs) work for this issue. However, I will say option 2 will be easier and more flexible considering the long training time of xDTD. For option 1, are we sure that the edges with domain_range_exclusion=True include all edges that we would like to be excluded for training? Or are they just a subset of them? If the domain_range_exclusion=True includes all, then we can exclude those edges in training.

amykglen · 2024-03-12T21:13:49Z

@amykglen what do you think?

Adding a filter seems fine to me - and I take back my statement that those edges should be removed from the training dataset specifically, ha - I don't know enough about xDTD to know whether that would make sense. But I agree with Steve that at least the results that ARAXInfer returns shouldn't include domain_range_exclusion=True edges, however it makes sense to achieve that.

For option 1, are we sure that the edges with domain_range_exclusion=True include all edges that we would like to be excluded for training? Or are they just a subset of them?

I think @saramsey or @sundareswarpullela or @acevedol know more about this than me, but from what I can tell, I think it's only SemmedDB edges that are marked as domain_range_exclusion=True (where appropriate). However, I'm guessing that SemmedDB is the main 'problem' source for edges with invalid domain/range anyway, so maybe that is sufficient?

dkoslicki · 2024-03-13T14:48:46Z

@chunyuma since it takes so long to re-train xDTD, what about the following path forward:

Add the filter to the xDTD output
As time permits, update the xDTD training code to exclude such edges. No need to do a full re-build until a new version of KG2 warrants it.

chunyuma · 2024-03-13T17:20:50Z

Sure, I can add a filter to the xDTD output. Can I know where I can find the edge attribute domain_range_exclusion? I can't find it in the edges_c.tsv file of KG v2.8.4.

amykglen · 2024-03-14T18:35:19Z

Huh, that's weird. I see it in my copy of KG2.8.4c:

ubuntu@ip-172-31-48-160:~/plater-plover$ cat edges_c_header.tsv 
subject	object	predicate	primary_knowledge_source	publications:string[]	publications_info	kg2_ids:string[]	qualified_predicate	qualified_object_aspect	qualified_object_direction	domain_range_exclusion	id	:TYPE	:START_ID	:END_ID

Also note that currently the values for domain_range_exclusion are strings ("True" or "False"), though eventually they will be switched to actual booleans (see #2185). So you might want to set up your code to handle either strings or booleans

chunyuma · 2024-03-14T18:39:25Z

Thanks @amykglen! I will check it again.

#2243

chunyuma · 2024-03-17T21:47:11Z

Hi team,

I have already updated the xDTD database for KG2.8.4 to exclude all edges with domain_range_exclusion==True. It should now solve this issue. I tested test_ARAX_infer.py but got an error reported in issue #2252.

amykglen · 2024-03-19T01:05:25Z

hey @chunyuma - I just responded in #2252 about the error you're seeing

chunyuma · 2024-03-19T14:03:20Z

Thanks @amykglen. Now the updated xDTD database has passed the Infer tests. I think we can verify this solution for this issue after deployment.

edeutsch assigned saramsey and dkoslicki Mar 1, 2024

chunyuma added a commit that referenced this issue Mar 17, 2024

update xdtd database to exclude edges with domain_range_exclusion==True

ca5f659

#2243

chunyuma added the verify in next deployment label Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epoprostenol used to treat rats #2243

Epoprostenol used to treat rats #2243

edeutsch commented Mar 1, 2024

edeutsch commented Mar 1, 2024

amykglen commented Mar 4, 2024 •

edited

saramsey commented Mar 11, 2024

saramsey commented Mar 11, 2024

chunyuma commented Mar 12, 2024

amykglen commented Mar 12, 2024 •

edited

dkoslicki commented Mar 13, 2024

chunyuma commented Mar 13, 2024

amykglen commented Mar 14, 2024

chunyuma commented Mar 14, 2024

chunyuma commented Mar 17, 2024

amykglen commented Mar 19, 2024

chunyuma commented Mar 19, 2024

Epoprostenol used to treat rats #2243

Epoprostenol used to treat rats #2243

Comments

edeutsch commented Mar 1, 2024

edeutsch commented Mar 1, 2024

amykglen commented Mar 4, 2024 • edited

saramsey commented Mar 11, 2024

saramsey commented Mar 11, 2024

chunyuma commented Mar 12, 2024

amykglen commented Mar 12, 2024 • edited

dkoslicki commented Mar 13, 2024

chunyuma commented Mar 13, 2024

amykglen commented Mar 14, 2024

chunyuma commented Mar 14, 2024

chunyuma commented Mar 17, 2024

amykglen commented Mar 19, 2024

chunyuma commented Mar 19, 2024

amykglen commented Mar 4, 2024 •

edited

amykglen commented Mar 12, 2024 •

edited