ValueError: Input graph and Layer graph are not the same #314

imLogM · 2020-10-22T04:01:32Z

euler2.0
wiki中的例子：examples/graphsage/python run_graphsage.py
该例子默认使用 estimator.train() 和 estimator.evaluate() 完成训练和验证。
但如果使用 estimator.train_and_evaluate() 会报错，报错信息为：ValueError: Input graph and Layer graph are not the same: Tensor("MPGather:0", shape=(?, 1433), dtype=float32) is not from the passed-in graph.
（我是分布式环境报错的，单机环境不确定是否报错）

对于 tf.estimator 来说，需要将所有 tf op 操作都放入 model_fn 以及 input_fn，否则就会出现这个报错。

以报错的源码为例：tf_euler/python/convolution/sage_conv.py。
__init__函数只会在第一次建图的时候调用，对于 estimator.train 或者 estimator.evaluate 因为全程只建一次图，不会出错。
但是对于 estimator.trainand_evaluate，其中的 evaluate 过程会多次建图，这样导致从第二次建图开始 init 中的 tf op 操作不被运行到，导致报错。正确的方式是将所有 tf op 操作都迁移出 init 函数。
下图中，注释掉的是原来会导致报错的写法，高亮的是修正后的写法。由于该问题在 euler2.0 中出现频繁，需要修改的地方非常多，我懒得提 pull requests，写个 issues 帮助其他小伙伴排查问题。

注意： 这种写法可能引入新的问题，举个例子：
假如在 init 函数中 self.fc = tf.layer.Dense(dim); 在call函数中 a = self.fc(x); b = self.fc(y)；那么 a，b的两个网络是同一个。
但是如果删除 init 函数中的 self.fc = tf.layer.Dense(dim); 在 call 函数中写 a = tf.layer.Dense(dim)(x); b = tf.layer.Dense(dim)(x)；那么a，b两个网络不是同一个。
所以，在需要参数共享的场景，额外考虑下参数共享的问题。

guizhiyi · 2021-07-15T12:05:35Z

您好，我也遇到了这个问题，并尝试用您的这种方式。但训练和eval都出错了，loss特别特别小，mrr一直等于1.不知道您有没有遇到这个问题并且是怎么解决的

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Input graph and Layer graph are not the same #314

ValueError: Input graph and Layer graph are not the same #314

imLogM commented Oct 22, 2020 •

edited

guizhiyi commented Jul 15, 2021

ValueError: Input graph and Layer graph are not the same #314

ValueError: Input graph and Layer graph are not the same #314

Comments

imLogM commented Oct 22, 2020 • edited

guizhiyi commented Jul 15, 2021

imLogM commented Oct 22, 2020 •

edited