Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalise MedGen xref table - Updated MedGen ROBOT template #7467

Merged
merged 2 commits into from
May 23, 2024

Conversation

joeflack4
Copy link
Collaborator

@joeflack4 joeflack4 commented Mar 24, 2024

addresses monarch-initiative/medgen#15

Changes

  • Update: MedGen ROBOT template - with some fixes, updates, and now also includes MeSH mappings

Related


Google Sheet - general (no MeSH)
Google Sheet - MeSH only

Copy link
Collaborator Author

@joeflack4 joeflack4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matentzn Just double checking, we don't need to worry about "conflicts" anymore, right? At this point, we're trusting MedGen as the source for these mappings, so I don't need to run any such analysis, yes?

Context: Previous ad-hoc MedGen conflicts goals

# MedGen conflicts (Aug 2023) pipeline
# - https://github.com/monarch-initiative/mondo-ingest/issues/273
.PHONY: address-medgen-conflicts-aug2023
address-medgen-conflicts-aug202x3: address-medgen-conflicts-aug2023-deletes address-medgen-conflicts-aug2023-adds

# - MedGen conflicts: dependencies
tmp/July2023_CUIReports_FromMedGentoMondo.xlsx:
    mkdir -p tmp && wget "https://github.com/monarch-initiative/mondo-ingest/files/12029712/July2023_CUIReports_FromMedGentoMondo.xlsx" -O $@

# - MedGen conflicts: analysis
tmp/mondo-edit.obo.tmp.diff: mondo-edit.obo.tmp
    -diff mondo-edit.obo mondo-edit.obo.tmp > $@

tmp/report-qc-medgen-conflicts-update-diff.tsv: tmp/mondo-edit.obo.tmp.diff
    python ../scripts/medgen_conflicts_removals_diff_analysis.py -i tmp/mondo-edit.obo.tmp.diff -o $@

tmp/report-qc-medgen-conflicts-adds-diff.tsv: $(TEMPLATES_DIR)/ROBOT_addMedGen_fromConflictResolution.tsv $(TEMPLATES_DIR)/ROBOT_addMedGen_fromIngest.tsv
    python ../scripts/medgen_conflicts_add_xrefs_diff_analysis.py -c $(TEMPLATES_DIR)/ROBOT_addMedGen_fromConflictResolution.tsv -i $(TEMPLATES_DIR)/ROBOT_addMedGen_fromIngest.tsv -o $@

# - MedGen conflicts: deletes
.PHONY: address-medgen-conflicts-aug2023-deletes
address-medgen-conflicts-aug2023-deletes: mondo-edit.obo.tmp
    mv mondo-edit.obo.tmp mondo-edit.obo

mondo-edit.obo.tmp: tmp/July2023_CUIReports_FromMedGentoMondo.xlsx mondo-edit.obo
    python ../scripts/medgen_conflicts_removals.py -i tmp/July2023_CUIReports_FromMedGentoMondo.xlsx -I mondo-edit.obo -o $@

# - MedGen conflicts: adds
.PHONY: address-medgen-conflicts-aug2023-adds
address-medgen-conflicts-aug2023-adds: tmp/report-qc-medgen-conflicts-adds-diff.tsv

$(TEMPLATES_DIR)/ROBOT_addMedGen_fromConflictResolution.tsv: tmp/July2023_CUIReports_FromMedGentoMondo.xlsx
    python ../scripts/medgen_conflicts_add_xrefs.py -i tmp/July2023_CUIReports_FromMedGentoMondo.xlsx -o $@

$(TEMPLATES_DIR)/ROBOT_addMedGen_fromIngest.tsv:
    wget "https://github.com/monarch-initiative/medgen/releases/latest/download/medgen-xrefs.robot.template.tsv" -O $@

@matentzn
Copy link
Member

@matentzn Just double checking, we don't need to worry about "conflicts" anymore, right? At this point, we're trusting MedGen as the source for these mappings, so I don't need to run any such analysis, yes?

Thats right, thanks for checking! We are just trusting for now!

Copy link
Member

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Remove all rows with MEDGENCUI in it (we only need xrefs to MEDGEN:123 and UMLS:123)

(you checked the box in the ticket, but rows are still in there)

@joeflack4
Copy link
Collaborator Author

(you checked the box in the ticket, but rows are still in there)
@matentzn Glad to have your eyes on these, sorry about my error rate!

What happened here is that I did check multiple times on the MedGen side of things, but I marked the release as a "pre release". I thought it still carried the "latest" tag after that, but it removed it. So when I ran the goal here, it did not select the correct release, and I didn't check the results.

I fixed that issue, re-ran it, and confirmed there is no more MEDGENCUI.

@joeflack4 joeflack4 changed the title Updated MedGen ROBOT template Finalise MedGen xref table - Updated MedGen ROBOT template Apr 3, 2024
@joeflack4
Copy link
Collaborator Author

@matentzn I made the appropriate changes in monarch-initiative/medgen#19 and updated the file here. I think this should be good now; maybe take a look. You can also examine as a google sheet: https://docs.google.com/spreadsheets/d/1OD7WUR-nJ9lUqRv01TTghmf3aGvQRbWV5ZhVhthzB6U/edit#gid=1627855620

Copy link
Collaborator

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the content in the gSheet linked in PR.

When the xref_id value is from MESH, there are many cases where the same MESH identifier, e.g. MESH:C000719212 exists more than once in the ROBOT file for different Mondo IDs, e.g. MONDO:0000600 and MONDO:0003736 which I thought caused a proxy merge and was not allowed.

@twhetzel
Copy link
Collaborator

I chatted with @matentzn and the current thought is that the medgen xrefs can be added now, but not the mesh ones.

@twhetzel
Copy link
Collaborator

NOTE: The issue of proxy merges is being discussed with MedGen is this issue monarch-initiative/mondo-ingest#491

Copy link
Member

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split the table into MEDGEN and MESH as per #7467 (comment) by @twhetzel, then this is ready.

@joeflack4
Copy link
Collaborator Author

joeflack4 commented May 4, 2024

  • @joeflack4 medgen repo: Move MeSH xrefs into a secondary robot template which will not currently be used.
  • @joeflack4 mondo repo: Pull down into this PR the new, updated original robot template, which will no longer have MeSH xrefs

@matentzn @twhetzel I misunderstood what Trish wrote. I thought this meant that whoever was running the robot template to operate on mondo-edit.obo would do something to avoid merging in the MeSH terms, and that this was an FYI. I understand the task now. Easy fix; will get to this next week!

@joeflack4
Copy link
Collaborator Author

joeflack4 commented May 8, 2024

I've completed the tasks for splitting out MeSH xrefs as requested. Now ready for re-review / merge.

Results:
Google Sheet - general (no MeSH)
Google Sheet - MeSH only

Completed by:

@matentzn
Copy link
Member

matentzn commented May 9, 2024

Looks awesome! But where is the code to support this change? It does not seem to be in this PR?

Edit (Joe): Nico said nevermind this comment; he sees the changes now.

@joeflack4 joeflack4 force-pushed the medgen-robot-update branch 2 times, most recently from db6b3c0 to 4bef020 Compare May 15, 2024 11:44
@joeflack4
Copy link
Collaborator Author

@matentzn This is ready to be merged, but I think you still need to mark resolved your requested changes.

The requested change (split MeSH out) has been completed:

  1. here in this PR
  2. In the related MedGen PR (merged already)

@matentzn
Copy link
Member

This PR has a single file in the diff with MESH and Medgen xrefs together - this does not seem to confirm changes are adressed?

Copy link
Collaborator Author

@joeflack4 joeflack4 May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review update as of 2024/05/15

  • @matentzn Review output again and merge if OK

Thanks @matentzn for teaching me how to comment on a file rather than a line. Let's use this thread to avoid clutter.

Nico wrote:

This PR has a single file in the diff with MESH and Medgen xrefs together - this does not seem to confirm changes are addressed?

I was really surprised to see you say that! And you're right. I did the old mistake of trusting normal things to not work and, seeing my MedGen repo release coming out OK, I ran the goal that does the wget over here, and updated this PR, thinking the output worked without checking it.

But that didn't work. Two weird things appear to have happened, and I'm not sure why. I think (1) was sufficient to cause the issue of this file no getting updated, but (2) is also strange.

Possible reasons why it didn't get updated

You probably don't have time to read the below, but I really wanted to know why this didn't update correctly. So feel free to skip if you want.

1. Release not marked latest

Details

The release I pushed out somehow didn't get activated as latest. So when I ran the goal to update in mondo, it used the wrong release.

  • Another curious thing is that, even though this release happened on 5/08, it was still showing at the very top of releases, even though (i) it was not the release labeled with latest, and (ii) it was showing up above the newer release on 5/12. I've since ran a new release today 5/15, and that one is correctly showing as latest and also shows at the top of the page.
  • As an aside, you may notice that the release that was supposed to have been latest when I did wget had the date format vYYYY-MM-DD, while most of the other releases had YYYY-MM-DD. I just fixed this inconsistency with this commit. Basically the one with the v was the result of running deploy-release locally, and the other ones are from the GitHub action. Perhaps the fact that it was deploy-release was why it didn't tag it as latest, but I don't see why.

2. Git commit of release wrong?

git log

The git commit of some recent releases is not what I would expect.

commit 989160fcb1b2c0ce7ebace1fe791d47d10d6cea2 (tag: 2024-05-15)
Date:   Wed May 15 19:42:46 2024 -0400
    - Add: MeSH robot template to release

commit 23c78aeadc409fa8faa9f439c76e5caa0eb66012
Date:   Sun May 12 21:45:54 2024 -0400
    Merge pull request #26 from monarch-initiative/split-mesh

commit 11f03ffbf6cc073b8789d9117088481d9a0676a9 (origin/split-mesh)
Date:   Wed May 8 18:17:20 2024 -0400
    Split MeSH outputs

commit c0bb5f43391051e0deca596c6287b7cb3a5aac99 (tag: v2024-05-08, tag: 2024-05-12, tag: 2024-05-05)
Date:   Sun Apr 28 17:07:31 2024 -0400
    Merge pull request #25 from monarch-initiative/fix-odk-version

I merged my PR after these releases ran (tag: v2024-05-08, tag: 2024-05-12, tag: 2024-05-05). So I'm not surprised by 5/05 and 5/12 using my unmerged code. but v2024-05-08, I would think I ran it from the correct commit... I must have run it accidentally from main. However, if I download the medgen-xrefs.robot.template.tsv from that release, it shows no MeSH mappings! So either the commit it's showing is wrong, or I ran it on main and then I edited the release with the correct outputs from my PR that hadn't been merged yet, which I don't remember doing. Really strange. And since medgen-xrefs.robot.template.tsv was correct here, than I would have expected the wget to have gotten the correct file. So the cause of the issue must be (1).


There's always occam's razor--that, rather than and in addition to issues (1) and (2) above, it might be that (3) even though I remember specifically updating mondo after merging the PR, maybe that's a false memory from a different time and I never updated it, or (4) I did update it but I forgot to force push my amended commit; I basically never do that, but never happens sometimes. It looks like I didn't amend my message though either.

Updated outputs

Check now. I've just created a new release, then ran the goal, checked the outputs, confirmed MeSH not there, and pushed a new commit.

Copy link
Member

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@joeflack4 joeflack4 merged commit 4254127 into master May 23, 2024
1 check passed
@joeflack4 joeflack4 deleted the medgen-robot-update branch May 23, 2024 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants