Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to evaluate a graph regression model with new data? #428

Open
alfonsogijon opened this issue Mar 22, 2023 · 4 comments
Open

How to evaluate a graph regression model with new data? #428

alfonsogijon opened this issue Mar 22, 2023 · 4 comments

Comments

@alfonsogijon
Copy link

alfonsogijon commented Mar 22, 2023

Hi, I slightly modified the qm9_ecc.py example (in disjoint mode) to construct a graph regression model to predict the energy of molecules (my dataset is different to QM9). For each molecular geometry I define a graph with graph=Graph(x=x,a=a,e=e,y=energy), which I add to the graph dataset. It works pretty well when I evaluate the model on the test dataset with a disjoint loader, but now I want to use this model to predict the energy on different data.

I would like to make predictions on new molecules (graphs) one by one, because I need to know the energy of a molecule to generate the next molecule. From a molecular geometry I define the data matrices of the graph, (x,a,e,y), but I do not know how to predict its energy with the pre-trained model. I would need something like energy = model(graph, training=False), or energy = model(inputs, training=False) where inputs=[x,a,e,y]. I have tried using a SingleLoader but it does not work. From my graph object or data matrices (x,a,e), how could I evaluate the pre-trained model on it?

Thanks in advance,

Alfonso

@alfonsogijon alfonsogijon changed the title How to save model? How to evaluate the model with new data? How to evaluate a graph regression model with new data? Mar 22, 2023
@danielegrattarola
Copy link
Owner

Like you said, you would need to call the model on the matrices describing your graph, so energy = model(inputs, training=False)
You just need to take care of converting the adjacency matrix to a sparse matrix using this function.

@alfonsogijon
Copy link
Author

alfonsogijon commented Mar 24, 2023

Thanks, but I am still not able to do it. Why the following piece of code does not work?:

x = dataset_tr[10].x
a = dataset_tr[10].a
e = dataset_tr[10].e
y = dataset_tr[10].y

#a = sp_matrix_to_sp_tensor(a)
inputs = [x,a,y]
energy = model(inputs,training=False)

The adjacency matrix is already sparse, because I am reading it from a graph of my training dataset, and it was made sparse before training. This is my model, I do not use the edge features in this case:

def call(self, inputs):
        x, a, i = inputs
        x = self.conv1([x, a])
        x = self.conv2([x, a])
        output = self.global_pool([x, i])
        output = self.dense1(output)
        output = self.dense2(output)
        output = self.dense(output)

An error occurs when doing the first convolution:
'tuple' object has no attribute 'rank'
Call arguments received by layer 'gat_conv' (type GATConv):
• inputs=['tf.Tensor(shape=(228, 4), dtype=float32)', "<228x228 sparse matrix of type ''\n\twith 2364 stored elements in Compressed Sparse Row format>"]
• mask=None

@danielegrattarola
Copy link
Owner

danielegrattarola commented Mar 24, 2023

Sorry, it's actually a bit more involved than that.

Your input must contain the i tensor representing the batch index of the graph. In this case, since you only have one graph, you just need a tensor of all zeros.

In other words:

i = tf.zeros(x.shape[0])
inputs = [x, a, i]

You don't pass y as an input because that's just a target label.

Also, a must be a sparse tensor, not a sparse matrix. So you still need the call to sp_matrix_to_sp_tensor

@alfonsogijon
Copy link
Author

Ok, now the following code produces an error in the global pool layer:

a = dataset_tr[10].a
e = dataset_tr[10].e
i = tf.zeros(x.shape[0])

a = sp_matrix_to_sp_tensor(a)
inputs = [x,a,i]
energy = model(inputs,training=False)

The error message is:

Value for attr 'Tindices' of float is not in the list of allowed values: int32, int64
	; NodeDef: {{node SegmentSum}}; Op output:T; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, 6034766930529145842, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]; attr=Tindices:type,allowed=[DT_INT32, DT_INT64]> [Op:SegmentSum]

Call arguments received by layer 'global_sum_pool' (type GlobalSumPool):
  • inputs=['tf.Tensor(shape=(417, 56), dtype=float32)', 'tf.Tensor(shape=(417,), dtype=float32)']

The only way I can evaluate the energy of a single graph is defining a disjoint loader with a dataset of only that graph. But maybe there is a better way to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants