Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model.evalute() reprodusibility problem #432

Open
rmrmg opened this issue May 29, 2023 · 5 comments
Open

model.evalute() reprodusibility problem #432

rmrmg opened this issue May 29, 2023 · 5 comments

Comments

@rmrmg
Copy link

rmrmg commented May 29, 2023

I tried to mimic this example https://github.com/danielegrattarola/spektral/blob/master/examples/node_prediction/citation_gcn.py
with custom data (multiple graph and BatchLoader) and regression task, full python script in in attached file
batchGCN.py.txt
crucial part of the code:

    data = TestData(normalize_x=True, transforms=[LayerPreprocess(GCNConv)])
    idxs = numpy.random.permutation(len(data))
    pivot = int(0.8 * len(data))
    idx_tr, idx_te = numpy.split(idxs, [pivot, ])
    data_tr = data[idx_tr]
    data_test = data[idx_te]
    loader_tr = BatchLoader(data_tr)
    loader_test = BatchLoader(data_test)

    N = 100 #data.n_nodes  # Number of nodes in the graph
    F = data.n_node_features
    x_in = Input(shape=(F,))
    a_in = Input((N,), sparse=True)
    output = GCNConv(1, activation="relu", use_bias=False)([x_in, a_in])

    # Build model
    model = Model(inputs=[x_in, a_in], outputs=output)
    optimizer = Adam(learning_rate=0.003)
    model.compile(optimizer=optimizer, loss="mse", weighted_metrics=["acc"])
    model.summary()

    model.fit(loader_tr.load(), steps_per_epoch=loader_tr.steps_per_epoch,
              validation_data=loader_test.load(), validation_steps=loader_test.steps_per_epoch,
              epochs=epochs, callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)])

    for i in range(5):
        loss = model.evaluate(loader_test, steps=loader_test.steps_per_epoch)
        print("LOST", loss)
  1. I have question: what should be N in my case? I guess it should be at least of size of biggest graph and highest value means extra padding and nothing more than extra training time. Is my guess correct? What happen when N is smaller than number of node?

  2. and problem: Multiple run of the same model.evaluate() results in different result and it not depends on N (i.e. for any testes N, bigger or smaller than the biggest graph, the fluctuations are observed). Is this bug in my code or issue with spectral?
    epochs=20, N=10

29/29 [==============================] - 1s 16ms/step - loss: 4.3498 - acc: 0.0000e+00
LOST [4.349807262420654, 0.0]
29/29 [==============================] - 0s 13ms/step - loss: 4.2849 - acc: 0.0000e+00
LOST [4.284859657287598, 0.0]
29/29 [==============================] - 1s 17ms/step - loss: 4.2611 - acc: 0.0000e+00
LOST [4.26109504699707, 0.0]
29/29 [==============================] - 0s 11ms/step - loss: 4.3629 - acc: 0.0000e+00
LOST [4.3628973960876465, 0.0]
29/29 [==============================] - 0s 13ms/step - loss: 4.2602 - acc: 0.0000e+00
LOST [4.260205268859863, 0.0]

epochs=20 N=100

29/29 [==============================] - 1s 16ms/step - loss: 4.0418 - acc: 0.0000e+00
LOST [4.041775226593018, 0.0]
29/29 [==============================] - 0s 16ms/step - loss: 4.1152 - acc: 0.0000e+00
LOST [4.115159511566162, 0.0]
29/29 [==============================] - 1s 15ms/step - loss: 3.9473 - acc: 0.0000e+00
LOST [3.947335720062256, 0.0]
29/29 [==============================] - 0s 13ms/step - loss: 3.9188 - acc: 0.0000e+00
LOST [3.9188196659088135, 0.0]
29/29 [==============================] - 1s 16ms/step - loss: 3.9684 - acc: 0.0000e+00
LOST [3.968449115753174, 0.0]
@danielegrattarola
Copy link
Owner

This is a better starting point for a batch-mode model, can you try adapting your code/model to this example instead?
https://github.com/danielegrattarola/spektral/blob/master/examples/graph_prediction/qm9_ecc_batch.py

Cheers

@rmrmg
Copy link
Author

rmrmg commented Jun 2, 2023

Yes I can but not sure how to do this - what is the goal and what should be adapted...
Here is my thoughts:

  1. I dont have edge properties hence ECCConv is rather questionable starting point but probably this is not the point, and I can take GCNConv instead.
  2. I qm9... model last layer is Dense(n_out) this is nice for global (aka graph-level) property but I want to learn node properties
    so it is rather not for me
  3. based on 1 and 2 I think I can stay with 1-layer GCNConv networks (at least for test purpose)
  4. masking, mask=True in Loader and then self.masking = GraphMasking() and x = self.masking(x) - this for sure helps (I have problem with trained model which predict 0 for all nodes - using mask probably solve this)

So can I change model definition (as 1 and 2) in example with qm9 and and my loader?

@danielegrattarola
Copy link
Owner

For 1 and 2, you don't have to use the same model, but I think it would be easier for you to start from that code since you were struggling with batch mode.

I also suggest using model subclassing as in the batch mode example, instead of the old functional API of Keras that you are using in your code.

Cheers

@rmrmg
Copy link
Author

rmrmg commented Jun 4, 2023

Thx for reply I did following

class GNN(Model):
    def __init__(self):
        super().__init__()
        self.masking = GraphMasking()
        self.conv1 = GCNConv(1, activation="relu")

    def call(self, inputs):
        x, a = inputs
        x = self.masking(x)
        output = self.conv1([x, a])
        return output


def train(model, learning_rate=1e-2, epochs=20):
    optimizer = Adam(learning_rate)
    model.compile(optimizer=optimizer, loss="mse")
    data = catalset.PDPData(normalize_x=True, transforms=[LayerPreprocess(GCNConv)])
    idxs = numpy.random.permutation(len(data))
    pivot = int(0.8 * len(data))
    idx_tr, idx_te = numpy.split(idxs, [pivot, ])
    data_tr = data[idx_tr]
    data_test = data[idx_te]
    batch_size = 10
    loader_tr = BatchLoader(data_tr, mask=False, batch_size=batch_size)
    loader_test = BatchLoader(data_test, mask=False, batch_size=batch_size)

    model.fit(loader_tr.load(), steps_per_epoch=loader_tr.steps_per_epoch, epochs=epochs)
    print("Testing model")
    model.fit(loader_tr.load(), steps_per_epoch=loader_tr.steps_per_epoch, epochs=epochs)
    print("Testing model")
    loss = model.evaluate(loader_test.load(), steps=loader_test.steps_per_epoch)
    print("Done. Test loss: {}".format(loss))


if __name__ == "__main__":
    model = GNN()
    train(model)

and this end up with

 File "/home/rmrmg/anaconda3/envs/alfabet/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 98, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

This because loader.load() returns:
[(<class 'numpy.ndarray'>, (10, 50, 6), dtype('float32')), (<class 'numpy.ndarray'>, (10, 50, 50), dtype('float64'))] (<class 'numpy.ndarray'> (10,) object) ] or in plain-english second element (Y) is of dtype object.
In PDPData read() returns [graphs.append(Graph(x=x, a=csr_matrix(a), y=y)) for (x, a, y) in full_data]
where y is of shape (N, 1) where N in number of nodes in graph (I tried also (N, ) but effect was the same).
Two years ago you wrote here BatchLoader only supports graph-level labels (meaning that labels do not get zero-padded -- that would not make sense) so all labels should have the same shape
So I changed in code presented above BatchLoader to DisjointLoader and model class to

class GNN(Model):
    def __init__(self):
        super().__init__()
        self.conv1 = GCNConv(1, activation="relu")

    def call(self, inputs):
        x, a, _ = inputs
        output = self.conv1([x, a])
        return output

and I got I think somehow similar problem cause by dtype=object of Y - both version (when y in Graph constructor is (N, ) and second when shape is (N, 1) ) of errors are attached.
error_N.1.txt
error_N.txt
I am lost and think the project need more documentation.

@danielegrattarola
Copy link
Owner

danielegrattarola commented Jun 5, 2023

If the labels have dtype object it likely means that they cannot be stacked, this is typical numpy behavior.
Have you checked the contents of y and made sure that all of them have the same size?

Anyway, the loaders are there to simplify users' lives but if they become a problem you can always write your data loading pipeline from scratch so that you have full control over it. Writing a training loop in TF is pretty easy nowadays, there's an example here.

The issue you mentioned is no longer relevant, and as you see in the documentation:

If node_level=False, the labels are interpreted as graph-level labels and are returned as an array of shape [batch, n_labels]. If node_level=True, then the labels are padded along the node dimension and are returned as an array of shape [batch, n_max, n_labels].

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants