Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate score based on a single node instead of aggregating the whole graph #398

Open
gonzalo-menendez opened this issue Oct 21, 2022 · 8 comments

Comments

@gonzalo-menendez
Copy link

gonzalo-menendez commented Oct 21, 2022

Hi!
I currently have a model (implemented via subclassing) that has 2 ECCConv layers, then aggregates the node embeddings with a global sum and runs the result through a NN to get a score value. I also use DisjointLoader to batch graphs together.

I'd like to try generating that score based only on one of the nodes' encodings, instead of aggreggating the whole graph.

I was hoping you could recommend what the best way of doing this would be, since none of the implemented pooling layers seem to do this.

Thanks!

@FinAminToastCrunch
Copy link

FinAminToastCrunch commented Oct 22, 2022

I might be misunderstanding what you're asking. Are you asking how to get a node level prediction? Because that is explained in the docs.

If you're trying to get the output of the entire graph FROM a single node then you can add one dummy node into your graph which is used for that.

@gonzalo-menendez
Copy link
Author

I think it is sort of a node level prediction, but I only have the target for one node per graph. I might have misunderstood what i've read about node level prediction, but what i understand is that in those cases you need to have a big graph and a target value for each node in the graph, so you train based on the loss for each node.

My situation is a bit different. I'm working with small graphs, each with one "main node". So I only have one target value per graph. Currently I've been using global sum to get an embedding and generate a score for each of those small graphs which i can compare with the target value. But I'd like to try using only the "main node"'s embedding instead.

@FinAminToastCrunch
Copy link

Is there a way you can mask the other node embeddings with a 0 or some dummy variable? That way your pooling layer only gets the value from your important node?

@gonzalo-menendez
Copy link
Author

gonzalo-menendez commented Oct 24, 2022

It might be possible. Not sure how I'd go about doing that though. It's a bit tricky since the DisjointLoader joins the embeddings of the different graphs one after the other, so it's hard to know which element is the main node for each graph. Since when building a model via subclassing tensorflow runs without eager excecution, I can't use eager functions to check the values of the index tensor that indicates which graph each node belongs to.

I also tried something similar where i divided the batch using dynamic partition and tried to keep the first node but I got a lot of warnings that some weights had gradient None, which I assume has to do with the fact that the final embeddings of the other nodes weren't being considered in the loss function.

@FinAminToastCrunch
Copy link

"Since when building a model via subclassing tensorflow runs without eager excecution, I can't use eager functions to check the values of the index tensor that indicates which graph each node belongs to."

I think a workaround is to have the index be an output of one of your layers.

@danielegrattarola
Copy link
Owner

Sorry for the late reply but for some reason I stopped receiving notifications.

To achieve what you want you need to have some way of tracking what the main node of each graph is, otherwise it's not possible to select it to compute the loss. As @FinAminToastCrunch suggested, you can create a mask by stacking it as a dummy node feature and using it to do global pooling.

@gonzalo-menendez
Copy link
Author

No worries, thanks for the reply!
I currently have a feature identifying the node. Would using SortPool to keep the node with the highest value in this flag feature achieve the desired result? What would be the ideal strategy for pooling?

@danielegrattarola
Copy link
Owner

Sure, if the index feature is the last column of the node feature matrix and you set k=1 in SortPool it should work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants