Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to resolve pathway identifiers? #31

Open
cthoyt opened this issue May 13, 2020 · 1 comment
Open

How to resolve pathway identifiers? #31

cthoyt opened this issue May 13, 2020 · 1 comment

Comments

@cthoyt
Copy link

cthoyt commented May 13, 2020

After a closer look, I'm having issues with the data source used for pathways.

Pathway::PC7_2008	BMAL1:CLOCK,NPAS2 activates circadian gene expression	Pathway

Should correspond to the reactome pathway, https://reactome.org/content/detail/R-HSA-1368108, but it's not clear what this identifier is. I might guess "Pathway Commons 7"

Later, wikipathways identifiers (plus revisions) are used for other pathways

Pathway::WP516_r71358	Hypertrophy Model	Pathway

It's nice to have the exact revision _r71358, but this isn't what's necessary to resolve this pathway and merge with other resources.

I'm also not sure what the actionable item is for this. I don't think you would update the source data, would you? Are there plans for a Hetionet v2.0 that will include some of the other new updates?

@dhimmel
Copy link
Member

dhimmel commented May 13, 2020

For PC7_2008, here's what I found from https://neo4j.het.io/:

MATCH (n:Pathway)
WHERE n.identifier = 'PC7_2008'
RETURN n
<id>:37740
identifier:PC7_2008
license:CC BY 4.0
name:BMAL1:CLOCK,NPAS2 activates circadian gene expression
source:Reactome via Pathway Commons

Pathway resources were combined in this notebook.

The Pathway Commons raw data we used is Pathway Commons.7.All.GSEA.hgnc.gmt, which includes the line:

9606: BMAL1:CLOCK,NPAS2 activates circadian gene expression	datasource: reactome; organism: 9606; id type: hgnc symbol	NAMPT	PPARA	CCRN4L	HELZ2	RORA	NR3C1	CHD9	NPAS2	CRY2	NR1D1	SMARCD3	SERPINE1	PER2	PER1	ARNTL2	BHLHE40	TGS1	BHLHE41	CRY1	TBL1XR1	AVP	RXRA	CREBBP	ARNTL	F7	PPARGC1A	NCOA1	HDAC3	EP300	NCOA2	DBP	NCOA6	TBL1X	CARM1	NCOR1	CLOCK	MED1

So it looks like this file lacked actual pathway identifiers so I assigned identifiers as incrementing integers prepended with PC7. Definitely not a good system! Not sure if Pathways Commons now provides pathway IDs for the source database in their data exports.

It's nice to have the exact revision _r71358, but this isn't what's necessary to resolve this pathway and merge with other resources.

Agree this would be best as a separate revision property rather than as part of the node identifier.

I'm also not sure what the actionable item is for this. I don't think you would update the source data, would you? Are there plans for a Hetionet v2.0 that will include some of the other new updates?

I'm not currently working on Hetionet v2.0. If someone wants to take the lead, I'd be happy to advise and support. There's lot's of low hanging fruit like updating resources and adding more properties (like CURIEs and URLs where missing).

I think one actionable item from your comment is that it would be nice to have a mapping for each Hetionet v.1.0 node to a CURIE for that node. Nodes would have an extra curie property, so it would be very backwards compatible. In the case of the Pathway Commons nodes, this might actually be a bit annoying to generate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants