-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate edges in Hetionet #13
Comments
At the moment I'm bypassing the error by enclosing the |
@veleritas you're getting the
Hopefully we can diagnose your issue, so you can remove the error handling here.
It's a combinatorial explosion! Not sure if that counts as exponential. The reason the 5 edges you mention have such a huge effect on the total number of possible metapaths is that they connect genes, compounds, and diseases -- which also have lot's of other metaedges. In the future, I could see some heuristic method that only computed DWPCs for metapaths that were likely to provide novel information. |
So I went back to see if I could pin down the reason why we seem to be getting different results. On a fresh Ubuntu 16.04 instance I have confirmed that (I am using Anaconda 4.3.1 for these tests). However, if you update the packages in the integrate environment through a Here's the
|
My guess is that some pandas behavior has changed. Can you see which rows are duplicated using the following: l1000_df[l1000_df.duplicated(['perturbagen', 'entrez_gene_id'], keep=False)]
stargeo_df[stargeo_df.duplicated(['slim_id', 'entrez_gene_id'], keep=False)]
Version changes frequently break things! If you want to update a dependency for an existing codebase, I'd do it one at a time and carefully. I wouldn't recommend That being said, I'm happy to implement a forward compatible syntax if we can figure out what the bug is. |
I can try to figure out what changed to cause these duplicate edges, but that will probably take a few days as I work through other priorities. |
Up to you. The motivation to diagnose it rather than use error handling is the possibility that's it's part of a bigger problem... but if you're getting the expected number of edges, it's probably not a huge issue. |
Hi Daniel,
Just wanted to note that there are still duplicate edges in hetionet in the newest
integrate.ipynb
. Specifically, the following two types of relationships give duplicate edge errors when the notebook is run:Disease-gene differential expression edges
LINCS Compound-gene dysregulation edges
Also, is the metaedge generation supposed to be exponential with the number of metapaths in the network? I noticed that if I don't include these types of metapaths in the network, but include everything else, then the number of metapaths drops from 1200 to only 130
The four regulation metapaths were not included due to the edge import errors, and the palliates one due to my excluding them for testing purposes.
The text was updated successfully, but these errors were encountered: