New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with RandomEdgeSplit for multilabel edge classification #9262
Comments
I was able to reproduce this problem with a minimal example. The root cause is that when pytorch_geometric/torch_geometric/transforms/random_link_split.py Lines 223 to 232 in d2f6eba
Unfortunately, this not only adds negative edges to the This seems like a very confusing kwarg, and possibly an unintended result? Would be happy to submit a PR to try to fix this. |
@sadrahkm A quick workaround is to pass the kwarg |
Thank you @keeganq for your help Right, I hadn't noticed that the Yes, putting |
I think if we want to have negative samples for train/val/test sets, there would be a problem with this issue. Because in that case, we would have to set |
馃悰 Describe the bug
Recently, I've been dealing with a multi-label edge classification problem. In other words, an edge can have more than one label. So I implemented a simple GNN model to see if I get good results or not.
I have 935 types of labels and have encoded them using the MultiLabelBinarizer method in sklearn. I have tested and I'm sure that all the labels are 0 or 1.
But after splitting the edges using
RandomEdgeSplit
, I noticed that there are more than two types of labels in the test and validation tests. I mean in the train set, there are 1 and 0, but in the validation set there 0, 1, 2. This makes the work a little hard. In the following screenshot, I have shown this. The first cell is the original data which is encoded with MultiLabelBinarizer. The next three cells are train/val/test sets, respectively. These train/val/test sets are splitted using the RandomEdgeSplit that I've provided in the code block.For example, I want to compute the AUC score in the test process. I have attached the code and errors that I've got. I don't what I should do why the edge splitter function returning more than two types of labels. I think it should only have 0 or 1. I would appreciate your help in this regards.
Versions
Collecting environment information...
PyTorch version: 2.2.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 12 (bookworm) (x86_64)
GCC version: (Debian 12.2.0-14) 12.2.0
Clang version: Could not collect
CMake version: version 3.25.1
Libc version: glibc-2.36
Python version: 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] (64-bit runtime)
Python platform: Linux-6.1.0-20-amd64-x86_64-with-glibc2.36
...
The text was updated successfully, but these errors were encountered: