-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalise MedGen xref table - Updated MedGen ROBOT template #7467
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn Just double checking, we don't need to worry about "conflicts" anymore, right? At this point, we're trusting MedGen as the source for these mappings, so I don't need to run any such analysis, yes?
Context: Previous ad-hoc MedGen conflicts goals
# MedGen conflicts (Aug 2023) pipeline
# - https://github.com/monarch-initiative/mondo-ingest/issues/273
.PHONY: address-medgen-conflicts-aug2023
address-medgen-conflicts-aug202x3: address-medgen-conflicts-aug2023-deletes address-medgen-conflicts-aug2023-adds
# - MedGen conflicts: dependencies
tmp/July2023_CUIReports_FromMedGentoMondo.xlsx:
mkdir -p tmp && wget "https://github.com/monarch-initiative/mondo-ingest/files/12029712/July2023_CUIReports_FromMedGentoMondo.xlsx" -O $@
# - MedGen conflicts: analysis
tmp/mondo-edit.obo.tmp.diff: mondo-edit.obo.tmp
-diff mondo-edit.obo mondo-edit.obo.tmp > $@
tmp/report-qc-medgen-conflicts-update-diff.tsv: tmp/mondo-edit.obo.tmp.diff
python ../scripts/medgen_conflicts_removals_diff_analysis.py -i tmp/mondo-edit.obo.tmp.diff -o $@
tmp/report-qc-medgen-conflicts-adds-diff.tsv: $(TEMPLATES_DIR)/ROBOT_addMedGen_fromConflictResolution.tsv $(TEMPLATES_DIR)/ROBOT_addMedGen_fromIngest.tsv
python ../scripts/medgen_conflicts_add_xrefs_diff_analysis.py -c $(TEMPLATES_DIR)/ROBOT_addMedGen_fromConflictResolution.tsv -i $(TEMPLATES_DIR)/ROBOT_addMedGen_fromIngest.tsv -o $@
# - MedGen conflicts: deletes
.PHONY: address-medgen-conflicts-aug2023-deletes
address-medgen-conflicts-aug2023-deletes: mondo-edit.obo.tmp
mv mondo-edit.obo.tmp mondo-edit.obo
mondo-edit.obo.tmp: tmp/July2023_CUIReports_FromMedGentoMondo.xlsx mondo-edit.obo
python ../scripts/medgen_conflicts_removals.py -i tmp/July2023_CUIReports_FromMedGentoMondo.xlsx -I mondo-edit.obo -o $@
# - MedGen conflicts: adds
.PHONY: address-medgen-conflicts-aug2023-adds
address-medgen-conflicts-aug2023-adds: tmp/report-qc-medgen-conflicts-adds-diff.tsv
$(TEMPLATES_DIR)/ROBOT_addMedGen_fromConflictResolution.tsv: tmp/July2023_CUIReports_FromMedGentoMondo.xlsx
python ../scripts/medgen_conflicts_add_xrefs.py -i tmp/July2023_CUIReports_FromMedGentoMondo.xlsx -o $@
$(TEMPLATES_DIR)/ROBOT_addMedGen_fromIngest.tsv:
wget "https://github.com/monarch-initiative/medgen/releases/latest/download/medgen-xrefs.robot.template.tsv" -O $@
Thats right, thanks for checking! We are just trusting for now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Remove all rows with MEDGENCUI in it (we only need xrefs to MEDGEN:123 and UMLS:123)
(you checked the box in the ticket, but rows are still in there)
e483136
to
24c6f98
Compare
What happened here is that I did check multiple times on the MedGen side of things, but I marked the release as a "pre release". I thought it still carried the "latest" tag after that, but it removed it. So when I ran the goal here, it did not select the correct release, and I didn't check the results. I fixed that issue, re-ran it, and confirmed there is no more MEDGENCUI. |
24c6f98
to
d818278
Compare
@matentzn I made the appropriate changes in monarch-initiative/medgen#19 and updated the file here. I think this should be good now; maybe take a look. You can also examine as a google sheet: https://docs.google.com/spreadsheets/d/1OD7WUR-nJ9lUqRv01TTghmf3aGvQRbWV5ZhVhthzB6U/edit#gid=1627855620 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the content in the gSheet linked in PR.
When the xref_id
value is from MESH, there are many cases where the same MESH identifier, e.g. MESH:C000719212
exists more than once in the ROBOT file for different Mondo IDs, e.g. MONDO:0000600
and MONDO:0003736
which I thought caused a proxy merge and was not allowed.
I chatted with @matentzn and the current thought is that the medgen xrefs can be added now, but not the mesh ones. |
d818278
to
673ab54
Compare
NOTE: The issue of proxy merges is being discussed with MedGen is this issue monarch-initiative/mondo-ingest#491 |
673ab54
to
1df0419
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split the table into MEDGEN and MESH as per #7467 (comment) by @twhetzel, then this is ready.
@matentzn @twhetzel I misunderstood what Trish wrote. I thought this meant that whoever was running the robot template to operate on |
1df0419
to
1cdbcef
Compare
c4ca047
to
a16f971
Compare
I've completed the tasks for splitting out MeSH xrefs as requested. Now ready for re-review / merge. Results: Completed by: |
Looks awesome! But where is the code to support this change? It does not seem to be in this PR? Edit (Joe): Nico said nevermind this comment; he sees the changes now. |
db6b3c0
to
4bef020
Compare
@matentzn This is ready to be merged, but I think you still need to mark resolved your requested changes. The requested change (split MeSH out) has been completed:
|
This PR has a single file in the diff with MESH and Medgen xrefs together - this does not seem to confirm changes are adressed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review update as of 2024/05/15
- @matentzn Review output again and merge if OK
Thanks @matentzn for teaching me how to comment on a file rather than a line. Let's use this thread to avoid clutter.
This PR has a single file in the diff with MESH and Medgen xrefs together - this does not seem to confirm changes are addressed?
I was really surprised to see you say that! And you're right. I did the old mistake of trusting normal things to not work and, seeing my MedGen repo release coming out OK, I ran the goal that does the wget
over here, and updated this PR, thinking the output worked without checking it.
But that didn't work. Two weird things appear to have happened, and I'm not sure why. I think (1) was sufficient to cause the issue of this file no getting updated, but (2) is also strange.
Possible reasons why it didn't get updated
You probably don't have time to read the below, but I really wanted to know why this didn't update correctly. So feel free to skip if you want.
1. Release not marked latest
Details
The release I pushed out somehow didn't get activated as latest
. So when I ran the goal to update in mondo
, it used the wrong release.
- Another curious thing is that, even though this release happened on 5/08, it was still showing at the very top of releases, even though (i) it was not the release labeled with
latest
, and (ii) it was showing up above the newer release on 5/12. I've since ran a new release today 5/15, and that one is correctly showing aslatest
and also shows at the top of the page. - As an aside, you may notice that the release that was supposed to have been
latest
when I didwget
had the date formatvYYYY-MM-DD
, while most of the other releases hadYYYY-MM-DD
. I just fixed this inconsistency with this commit. Basically the one with thev
was the result of runningdeploy-release
locally, and the other ones are from the GitHub action. Perhaps the fact that it wasdeploy-release
was why it didn't tag it aslatest
, but I don't see why.
2. Git commit of release wrong?
git log
The git commit of some recent releases is not what I would expect.
commit 989160fcb1b2c0ce7ebace1fe791d47d10d6cea2 (tag: 2024-05-15)
Date: Wed May 15 19:42:46 2024 -0400
- Add: MeSH robot template to release
commit 23c78aeadc409fa8faa9f439c76e5caa0eb66012
Date: Sun May 12 21:45:54 2024 -0400
Merge pull request #26 from monarch-initiative/split-mesh
commit 11f03ffbf6cc073b8789d9117088481d9a0676a9 (origin/split-mesh)
Date: Wed May 8 18:17:20 2024 -0400
Split MeSH outputs
commit c0bb5f43391051e0deca596c6287b7cb3a5aac99 (tag: v2024-05-08, tag: 2024-05-12, tag: 2024-05-05)
Date: Sun Apr 28 17:07:31 2024 -0400
Merge pull request #25 from monarch-initiative/fix-odk-version
I merged my PR after these releases ran (tag: v2024-05-08, tag: 2024-05-12, tag: 2024-05-05)
. So I'm not surprised by 5/05 and 5/12 using my unmerged code. but v2024-05-08
, I would think I ran it from the correct commit... I must have run it accidentally from main
. However, if I download the medgen-xrefs.robot.template.tsv
from that release, it shows no MeSH mappings! So either the commit it's showing is wrong, or I ran it on main
and then I edited the release with the correct outputs from my PR that hadn't been merged yet, which I don't remember doing. Really strange. And since medgen-xrefs.robot.template.tsv
was correct here, than I would have expected the wget
to have gotten the correct file. So the cause of the issue must be (1).
There's always occam's razor--that, rather than and in addition to issues (1) and (2) above, it might be that (3) even though I remember specifically updating mondo
after merging the PR, maybe that's a false memory from a different time and I never updated it, or (4) I did update it but I forgot to force push my amended commit; I basically never do that, but never happens sometimes. It looks like I didn't amend my message though either.
Updated outputs
Check now. I've just created a new release, then ran the goal, checked the outputs, confirmed MeSH not there, and pushed a new commit.
4bef020
to
ac26507
Compare
ac26507
to
1e2359a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
addresses monarch-initiative/medgen#15
Changes
Related
Google Sheet - general (no MeSH)
Google Sheet - MeSH only