Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry about metapaths from 2017 Paper "Systematic Integration of Biomedical Knowledge Prioritizes Drugs for Repurposing" #58

Open
ferry309 opened this issue Nov 2, 2023 · 7 comments

Comments

@ferry309
Copy link

ferry309 commented Nov 2, 2023

Hi, I am a postgraduate studying in the domain adaptation of pre-trained language models. I've been following your work in the realm of biomedical data integration.

I was particularly intrigued by your 2017 paper titled "Systematic Integration of Biomedical Knowledge Prioritizes Drugs for Repurposing." In it, you mentioned that 709 of the 1206 metapaths exhibited a statistically significant AUROC at a false discovery rate cutoff of 5%. However, while trying to replicate some of the results and delve deeper into the open-source data, I was unable to locate these 709 metapaths. Would it be possible for you to provide the specific metapaths and their instance paths? I am keen on further exploring these paths and your assistance would be of great help as I continue my journey in the biomedical domain.

@dhimmel
Copy link
Member

dhimmel commented Nov 2, 2023

Quoting from the manuscript:

Overall, 709 of the 1,206 metapaths exhibited a statistically significant Δ AUROC at a false discovery rate cutoff of 5%. These 709 metapaths included all 24 metaedges, suggesting that each type of relationship we integrated provided at least some therapeutic utility.

I was unable to locate these 709 metapaths.

We have an interactive table of the metapaths here, but it doesn't look like it has the fdr adjusted p-values.

I think the dataset you want is all-features/data/feature-performance/auroc.tsv. We then computed the FDR using the following R command in 6-rvisualize.ipynb:

fdr_delta_auroc = p.adjust(p = pval_delta_auroc, method = 'fdr')

I think we also saved the FDR adjusted p-values in 5-primary-aucs.ipynb to data/feature-performance/primary-aurocs.tsv. If you filter this dataset to feature_type == "dwpc" and fdr_pval_auroc < 0.5, I hope you get 709 rows 😃

@ferry309
Copy link
Author

ferry309 commented Nov 3, 2023

Thank you very much for your prompt reply.

I have successfully identified 1069 metapaths that meet the above criteria. My next objective is to find the instance paths for these metapaths. From my understanding, and based on the information you've provided, it seems you have generated query statements for each metapath to measure their effectiveness as features. Do you have the instance paths generated during the query process for metapaths?

If these data are not available, would I need to execute the queries individually on Neo4j to retrieve the information for all metapaths? Given that Neo4j in https://neo4j.het.io/ often experiences timeouts, this approach seems somewhat impractical.

Could you advise on the best course of action to obtain these data? Any suggestions or alternative methods you could provide would be immensely helpful.

@dhimmel
Copy link
Member

dhimmel commented Nov 11, 2023

Do you have the instance paths generated during the query process for metapaths?

We do not store actual paths corresponding to source node, target node, metapath combinations. Instead we generate them on the fly via Cypher queries to Neo4j.

When the path count is large, i.e. over 10,000, then I don't suggest trying to generate all paths. I don't see a valid use case for generating such a large number of paths though. When the path count is that large, any individual path tends to be pretty meaningless.

Also noting our recent publication Hetnet connectivity search provides rapid insights into how two biomedical entities are related.

@ferry309
Copy link
Author

Thanks a lot! I also have a question about the undirected metaedges in the paper. You mentioned it in the last sentence of the first paragraph on page 7: "Note that all metaedges besides Gene->regulates->Gene are undirected." Take Anatomy–upregulates–Gene as an example, we can not say Gene–>upregulates–>Anatomy but Anatomy-> was upregulated->Gene. Isn't this just a directed edge?

@dhimmel
Copy link
Member

dhimmel commented Nov 22, 2023

question about the undirected metaedges in the paper

See related issue #23.

Whether a metaedge/edge is directional or symmetric is a distinction that is most relevant when the source and target metanode are the same. When there are different source and target metanodes, we encoded "directionality" as different metaedges like:

  • Anatomy–upregulates–Gene
  • Anatomy–downregulates–Gene

@ferry309
Copy link
Author

You mean you use different edges between the same node pair to express the directionality. However, the entity pair, Compound and Disease, do not have different edges to represent the direction, but the same edge is used to represent the reverse direction in the metapath, e.g., Compound–palliates–Disease–palliates–Compound-treat-Disease. So I'm confused about how to distinguish the direction, or whether all edges in the meta-knowledge graph are bidirectional, even for Anatomy–upregulates–Gene and Anatomy–downregulates–Gene.

@dhimmel
Copy link
Member

dhimmel commented Dec 26, 2023

Compound–palliates–Disease and Disease–palliates–Compound are the same edge type, just with different orientations. There is no difference in the semantic meaning between the two, which is why we consider the bipartite edges in Hetionet as bidirectional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants