Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tensorflow] tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find variable 52/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. #21

Closed
peiwenhuang27 opened this issue Jul 23, 2021 · 10 comments

Comments

@peiwenhuang27
Copy link

Version: LPOT 1.5, Tensorflow 2.5, Intel-Tensorflow 2.5
Env: Google Colab

I was using a Keras saved model for quantization, and the following error occurs:

2021-07-23 03:49:12 [WARNING] There is no quantizable op type!!!
2021-07-23 03:49:12 [INFO] Getting FP32 model baseline...
2021-07-23 03:49:12 [INFO] Start to evaluate Tensorflow model...
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1360, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1453, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find variable 52/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Container localhost does not exist. (Could not find resource: localhost/52/kernel)
	 [[{{node model/52/Conv2D/ReadVariableOp}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 110, in <module>
    evaluate_opt_graph.run()
  File "main.py", line 94, in run
    q_model = quantizer()
  File "/usr/local/lib/python3.7/dist-packages/lpot/experimental/quantization.py", line 177, in __call__
    self.strategy.traverse()
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 286, in traverse
    self.baseline = self._evaluate(self.model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 424, in _evaluate
    val = self.objective.evaluate(eval_func, model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/objective.py", line 213, in evaluate
    acc = eval_func(model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/utils/create_obj_from_config.py", line 131, in eval_func
    tensorboard, fp32_baseline)
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tensorflow.py", line 210, in evaluate
    predictions = model.sess.run(output_tensor, feed_dict)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 968, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1191, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1369, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find variable 52/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Container localhost does not exist. (Could not find resource: localhost/52/kernel)
	 [[{{node model/52/Conv2D/ReadVariableOp}}]]

Also, I don't know why the system prints [WARNING] There is no quantizable op type!!!, because my model contains Conv2D operations and Matmul operations, which are clearly quantizable.

@guomingz
Copy link
Contributor

  1. For issue pasted, does it run successfully w/o lpot, e.g with your own script?
  2. may i know the exactly tensorflow version on your env?

image

  1. if possible, could u please show us the subgraph structure of your conv2d/matmul op?

@peiwenhuang27
Copy link
Author

  1. Yes, I did load my model with my own script for inference, which runs smoothly and correctly outputs the MSE loss, here is a snippet of the code:
with tf.compat.v1.Session() as sess:
    tf.compat.v1.saved_model.loader.load(sess, ['serve'], model_path)
    output = sess.graph.get_tensor_by_name('output:0')
    predictions = sess.run(output, {'first_input:0': x1[:64], 'second_input:0': x2[:64]})
    mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y[:64], predictions))
    print(mse.eval())
  1. As for the version of Tensorflow & Intel-Tensorflow, both are 2.5.0:

Screen Shot 2021-07-23 at 14 09 09

  1. My subgraph structure looks somewhat like below:
    (Here I used ONNX model converted from the Keras saved model for its better readability, but they are of the same structure essentially)

Screen Shot 2021-07-23 at 17 27 16

Screen Shot 2021-07-23 at 17 36 12

Is "Conv2D + LeakyRelu" a supported pattern in Tensorflow? On the other hand, LPOT recognizeds Matmul in my model as quantizable, but when quantizaing it always prints Unknown Matmul.

@guomingz
Copy link
Contributor

guomingz commented Jul 23, 2021

  1. t for inference, which runs smoothly and correctly outputs the MSE loss, here is a snippet of t

Could u please share the model if possible? Even the partial of the model would make us debug it effectively since the pure words doesn't inspire us too much:)

Would u please uninstall native tensorflow and intel-tensorflow from your env and just keep intel-tensorflow installed only? i guess the lpot didn't identify the tensorflow version correctly so it fallback its conv2d quantizable configuration to default one which didn't supports single convolution quantization. Conv2d+leakyrelu is not supported for tf2.x and tf1.15.up3 supports conv2d + biasadd+ leakyrelu fusion.

For matmul quantization, the tensorflow doesn't support single matmul quantization and lpot add additional pass InjectDummyBiasAddOptimizer to handle this case.

Did u see log like 2021-07-23 21:37:41 [INFO] Pass InjectDummyBiasAddOptimizer elapsed time: 2.86 ms from your side?

Would you please paste the full log if possible?

@peiwenhuang27
Copy link
Author

Of course! Thanks again for working on such a great package!

For privacy reasons, I decided to email you the model (only initialized, without training) to your inbox instead of posting it here, please check your inbox for the file, thank you.

I have also tried uninstalling Tensorflow and keep only Intel-Tensorflow in the environment, but the issue still occurs.
I did see InjectDummyBiasAddOptimizer in the log, along with ConvertLeakyReluOptimizer and ConvertAddToBiasAddOptimizer, the full log is pasted below:

2021-07-23 03:49:07 [INFO] Generating grammar tables from /usr/lib/python3.7/lib2to3/Grammar.txt
2021-07-23 03:49:07 [INFO] Generating grammar tables from /usr/lib/python3.7/lib2to3/PatternGrammar.txt
2021-07-23 03:49:10.121954: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-23 03:49:10.123098: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-07-23 03:49:10 [INFO] loading session....
2021-07-23 03:49:10.503093: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2299995000 Hz
2021-07-23 03:49:11 [INFO] ConvertLayoutOptimizer elapsed time: 0.08 ms
2021-07-23 03:49:11.303129: I tensorflow/core/grappler/devices.cc:78] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2021-07-23 03:49:11.303379: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-07-23 03:49:11.442033: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1144] Optimization results for grappler item: graph_to_optimize
  model_pruner: Graph size after: 36 nodes (-36), 34 edges (-99), time = 0.583ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.187ms.
  dependency_optimizer: Graph size after: 36 nodes (0), 34 edges (0), time = 0.306ms.
  debug_stripper: debug_stripper did nothing. time = 0.149ms.
  loop_optimizer: Graph size after: 36 nodes (0), 34 edges (0), time = 0.208ms.
  model_pruner: Graph size after: 36 nodes (0), 34 edges (0), time = 0.258ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.133ms.
  dependency_optimizer: Graph size after: 36 nodes (0), 34 edges (0), time = 0.277ms.
  debug_stripper: debug_stripper did nothing. time = 0.124ms.
Optimization results for grappler item: model_rnn_while_cond_1116
  model_pruner: Graph size after: 12 nodes (-1), 3 edges (-1), time = 0.073ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.012ms.
  dependency_optimizer: Graph size after: 12 nodes (0), 3 edges (0), time = 0.051ms.
  debug_stripper: debug_stripper did nothing. time = 0.005ms.
  loop_optimizer: Graph size after: 12 nodes (0), 3 edges (0), time = 0.047ms.
  model_pruner: Graph size after: 12 nodes (0), 3 edges (0), time = 0.039ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.006ms.
  dependency_optimizer: Graph size after: 12 nodes (0), 3 edges (0), time = 0.057ms.
  debug_stripper: debug_stripper did nothing. time = 0.006ms.
Optimization results for grappler item: model_rnn_while_body_1117
  model_pruner: Graph size after: 52 nodes (0), 64 edges (0), time = 0.216ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.028ms.
  dependency_optimizer: Graph size after: 49 nodes (-3), 55 edges (-9), time = 0.304ms.
  debug_stripper: debug_stripper did nothing. time = 0.022ms.
  loop_optimizer: Graph size after: 49 nodes (0), 55 edges (0), time = 0.207ms.
  model_pruner: Graph size after: 49 nodes (0), 55 edges (0), time = 0.192ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.021ms.
  dependency_optimizer: Graph size after: 49 nodes (0), 55 edges (0), time = 0.273ms.
  debug_stripper: debug_stripper did nothing. time = 0.022ms.
Optimization results for grappler item: model_rnn_1_while_body_898
  model_pruner: Graph size after: 52 nodes (0), 64 edges (0), time = 0.201ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.027ms.
  dependency_optimizer: Graph size after: 49 nodes (-3), 55 edges (-9), time = 0.297ms.
  debug_stripper: debug_stripper did nothing. time = 0.023ms.
  loop_optimizer: Graph size after: 49 nodes (0), 55 edges (0), time = 0.214ms.
  model_pruner: Graph size after: 49 nodes (0), 55 edges (0), time = 0.201ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.023ms.
  dependency_optimizer: Graph size after: 49 nodes (0), 55 edges (0), time = 0.26ms.
  debug_stripper: debug_stripper did nothing. time = 0.022ms.
Optimization results for grappler item: __inference_signature_wrapper_4205
  model_pruner: Graph size after: 36 nodes (-1), 35 edges (-2), time = 0.449ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.08ms.
  dependency_optimizer: Graph size after: 36 nodes (0), 35 edges (0), time = 0.308ms.
  debug_stripper: debug_stripper did nothing. time = 0.062ms.
  loop_optimizer: Graph size after: 36 nodes (0), 35 edges (0), time = 0.316ms.
  model_pruner: Graph size after: 36 nodes (0), 35 edges (0), time = 0.284ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.06ms.
  dependency_optimizer: Graph size after: 36 nodes (0), 35 edges (0), time = 0.29ms.
  debug_stripper: debug_stripper did nothing. time = 0.056ms.
Optimization results for grappler item: __inference__wrapped_model_1233
  model_pruner: Graph size after: 259 nodes (-90), 298 edges (-76), time = 1.645ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.285ms.
  dependency_optimizer: Graph size after: 259 nodes (0), 268 edges (-30), time = 1.863ms.
  debug_stripper: debug_stripper did nothing. time = 0.222ms.
  loop_optimizer: Graph size after: 259 nodes (0), 268 edges (0), time = 1.501ms.
  model_pruner: Graph size after: 259 nodes (0), 268 edges (0), time = 1.299ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.191ms.
  dependency_optimizer: Graph size after: 259 nodes (0), 268 edges (0), time = 1.657ms.
  debug_stripper: debug_stripper did nothing. time = 0.34ms.
Optimization results for grappler item: model_rnn_1_while_cond_897
  model_pruner: Graph size after: 12 nodes (-1), 3 edges (-1), time = 0.08ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.013ms.
  dependency_optimizer: Graph size after: 12 nodes (0), 3 edges (0), time = 0.061ms.
  debug_stripper: debug_stripper did nothing. time = 0.007ms.
  loop_optimizer: Graph size after: 12 nodes (0), 3 edges (0), time = 0.049ms.
  model_pruner: Graph size after: 12 nodes (0), 3 edges (0), time = 0.039ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.006ms.
  dependency_optimizer: Graph size after: 12 nodes (0), 3 edges (0), time = 0.043ms.
  debug_stripper: debug_stripper did nothing. time = 0.006ms.

2021-07-23 03:49:11 [INFO] Pass GrapplerOptimizer elapsed time: 425.95 ms
2021-07-23 03:49:11 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 13.12 ms
2021-07-23 03:49:11 [INFO] Pass SplitSharedInputOptimizer elapsed time: 1.16 ms
2021-07-23 03:49:11 [INFO] Pass GraphFoldConstantOptimizer elapsed time: 0.39 ms
2021-07-23 03:49:11 [INFO] Pass FuseColumnWiseMulOptimizer elapsed time: 0.51 ms
2021-07-23 03:49:11 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 1.76 ms
2021-07-23 03:49:11 [INFO] Pass GraphCseOptimizer elapsed time: 0.44 ms
2021-07-23 03:49:11 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 0.58 ms
2021-07-23 03:49:11 [INFO] Pass UpdateEnterOptimizer elapsed time: 0.25 ms
2021-07-23 03:49:11 [INFO] Pass ConvertLeakyReluOptimizer elapsed time: 0.47 ms
2021-07-23 03:49:11 [INFO] Pass InjectDummyBiasAddOptimizer elapsed time: 0.45 ms
2021-07-23 03:49:11 [INFO] Pass ConvertAddToBiasAddOptimizer elapsed time: 0.48 ms
2021-07-23 03:49:11 [INFO] Pass FuseTransposeReshapeOptimizer elapsed time: 0.46 ms
2021-07-23 03:49:11 [INFO] Pass FuseConvWithMathOptimizer elapsed time: 0.45 ms
2021-07-23 03:49:11 [INFO] Pass Pre Optimization elapsed time: 929.93 ms
2021-07-23 03:49:12 [WARNING] There is no quantizable op type!!!
2021-07-23 03:49:12 [INFO] Getting FP32 model baseline...
2021-07-23 03:49:12 [INFO] Start to evaluate Tensorflow model...
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1360, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1453, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find variable 52/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Container localhost does not exist. (Could not find resource: localhost/52/kernel)
	 [[{{node model/52/Conv2D/ReadVariableOp}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 110, in <module>
    evaluate_opt_graph.run()
  File "main.py", line 94, in run
    q_model = quantizer()
  File "/usr/local/lib/python3.7/dist-packages/lpot/experimental/quantization.py", line 177, in __call__
    self.strategy.traverse()
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 286, in traverse
    self.baseline = self._evaluate(self.model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 424, in _evaluate
    val = self.objective.evaluate(eval_func, model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/objective.py", line 213, in evaluate
    acc = eval_func(model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/utils/create_obj_from_config.py", line 131, in eval_func
    tensorboard, fp32_baseline)
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tensorflow.py", line 210, in evaluate
    predictions = model.sess.run(output_tensor, feed_dict)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 968, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1191, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1369, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find variable 52/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Container localhost does not exist. (Could not find resource: localhost/52/kernel)
	 [[{{node model/52/Conv2D/ReadVariableOp}}]]

@guomingz
Copy link
Contributor

guomingz commented Jul 27, 2021

log, alo

ok. i saw your mail and will have a try later.
btw, would u please share me the yaml configure for this model? so i can configure the dataloader.
Or, you may share me the pb with above structure?
https://user-images.githubusercontent.com/45028573/126763292-0dffe116-b97e-4860-bda9-1c136726e321.png

@peiwenhuang27
Copy link
Author

Of course! I have sent the link to the yaml configuration/main.py/data files, please check your inbox.
I'm not sure what you mean by sharing the pb with the above structure, I already sent a saved model format of the structure, should I also send a frozen pb file? (I'm having trouble converting the saved model to pb though)

@peiwenhuang27
Copy link
Author

Hi, just checking in to see if there is any update on the issue. I tried to run quantization under TF after a little modification of the model (converted the model from static input shape to dynamic input shape), and similar error still occurs:

2021-08-09 02:17:11 [INFO] Pass GrapplerOptimizer elapsed time: 577.41 ms
2021-08-09 02:17:11 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 10.71 ms
2021-08-09 02:17:11 [INFO] Pass SplitSharedInputOptimizer elapsed time: 1.55 ms
2021-08-09 02:17:11 [INFO] Pass GraphFoldConstantOptimizer elapsed time: 0.35 ms
2021-08-09 02:17:11 [INFO] Pass FuseColumnWiseMulOptimizer elapsed time: 0.5 ms
2021-08-09 02:17:11 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 1.68 ms
2021-08-09 02:17:11 [INFO] Pass GraphCseOptimizer elapsed time: 0.47 ms
2021-08-09 02:17:11 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 0.43 ms
2021-08-09 02:17:11 [INFO] Pass UpdateEnterOptimizer elapsed time: 0.28 ms
2021-08-09 02:17:11 [INFO] Pass ConvertLeakyReluOptimizer elapsed time: 0.43 ms
2021-08-09 02:17:11 [INFO] Pass InjectDummyBiasAddOptimizer elapsed time: 0.46 ms
2021-08-09 02:17:11 [INFO] Pass ConvertAddToBiasAddOptimizer elapsed time: 0.42 ms
2021-08-09 02:17:11 [INFO] Pass FuseTransposeReshapeOptimizer elapsed time: 0.46 ms
2021-08-09 02:17:11 [INFO] Pass FuseConvWithMathOptimizer elapsed time: 0.42 ms
2021-08-09 02:17:11 [INFO] Pass Pre Optimization elapsed time: 1191.54 ms
2021-08-09 02:17:12 [WARNING] There is no quantizable op type!!!
2021-08-09 02:17:12 [INFO] Getting FP32 model baseline...
2021-08-09 02:17:12 [INFO] Start to evaluate Tensorflow model...
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1360, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1453, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find variable 51/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Container localhost does not exist. (Could not find resource: localhost/51/bias)
	 [[{{node model/51/BiasAdd/ReadVariableOp}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 110, in <module>
    evaluate_opt_graph.run()
  File "main.py", line 94, in run
    q_model = quantizer()
  File "/usr/local/lib/python3.7/dist-packages/lpot/experimental/quantization.py", line 177, in __call__
    self.strategy.traverse()
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 286, in traverse
    self.baseline = self._evaluate(self.model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 424, in _evaluate
    val = self.objective.evaluate(eval_func, model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/objective.py", line 213, in evaluate
    acc = eval_func(model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/utils/create_obj_from_config.py", line 131, in eval_func
    tensorboard, fp32_baseline)
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tensorflow.py", line 211, in evaluate
    predictions = model.sess.run(output_tensor, feed_dict)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 968, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1191, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1369, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find variable 51/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Container localhost does not exist. (Could not find resource: localhost/51/bias)
	 [[{{node model/51/BiasAdd/ReadVariableOp}}]]

@peiwenhuang27
Copy link
Author

peiwenhuang27 commented Aug 27, 2021

Hi @guomingz , just checking to see if there is any update on the issue. I have also verified that with LPOT 1.6 & Tensorflow 2.6, the issue persists.
This error happens when I load model in saved_model format, and I have looked up the error online myself, and found some possible solutions:
OpenNMT/OpenNMT-tf#842
tensorflow/tensorflow#28287 (comment)

On the other hand, when I try using keras session by modifying model.py, line 92, line 303 to

# LSTM as custom object
model = tf.keras.models.load_model(model, custom_objects={'LSTMCell':tf.compat.v1.nn.rnn_cell.LSTMCell})

A different error occurs:

2021-08-27 01:39:21 [INFO] Pass GrapplerOptimizer elapsed time: 4117.61 ms
2021-08-27 01:39:21 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 301.24 ms
Traceback (most recent call last):
  File "main.py", line 118, in <module>
    evaluate_opt_graph.run()
  File "main.py", line 102, in run
    q_model = quantizer()
  File "/usr/local/lib/python3.7/dist-packages/lpot/experimental/quantization.py", line 201, in __call__
    return super(Quantization, self).__call__()
  File "/usr/local/lib/python3.7/dist-packages/lpot/experimental/component.py", line 188, in __call__
    self.pre_process()
  File "/usr/local/lib/python3.7/dist-packages/lpot/experimental/quantization.py", line 135, in pre_process
    _resume)
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/basic.py", line 85, in __init__
    dicts)
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 179, in __init__
    self.capability = self.adaptor.query_fw_capability(model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tensorflow.py", line 491, in query_fw_capability
    self.pre_optimized_model = self.pre_optimizer_handle.get_optimized_model()
  File "/usr/local/lib/python3.7/dist-packages/lpot/utils/utility.py", line 201, in fi
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_rewriter/generic/pre_optimize.py", line 99, in get_optimized_model
    self._tmp_graph_def = SplitSharedInputOptimizer(self._tmp_graph_def).do_transformation()
  File "/usr/local/lib/python3.7/dist-packages/lpot/utils/utility.py", line 201, in fi
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_rewriter/generic/split_shared_input.py", line 49, in do_transformation
    new_input_node.CopyFrom(graph_info[input_node_name].node)
KeyError: '^model/rnn/while/body/_49/lstm_cell_1/lstm_cell_1/BiasAdd/ReadVariableOp'

I want to add that the node name seems different from the one I see in Netron. In Netron, all my model names start with 'REG_Net', which is the name I assigned, but in the error message, the node name looks like it has not been renamed at all. I wonder if this is a compatibility issue?
Screen Shot 2021-08-27 at 09 55 38

Thank you!

@guomingz
Copy link
Contributor

Netron doesn't reflect the real model for saved model format, especially you opened the saved_model.pb.
It's totally different from frozen pb format.
For the issue u mentioned, i suggest you focus on the issue related below error firstly.

tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find variable 52/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Container localhost does not exist. (Could not find resource: localhost/52/kernel)
[[{{node model/52/Conv2D/ReadVariableOp}}]]

You may try to disable this line https://github.com/intel/lpot/blob/master/lpot/adaptor/tf_utils/graph_rewriter/generic/pre_optimize.py#L124 and see if the issues gone or not.

deb-intel added a commit to deb-intel/lp-opt-tool that referenced this issue Nov 4, 2021
Signed-off-by: Deb Taylor <deb.taylor@intel.com>
Reviewed-by: Feng Tian <feng.tian@intel.com>

Co-authored-by: Deb Taylor <deb.taylor@intel.com>
@chensuyue
Copy link
Contributor

No feedback over 2 weeks, closed at first. Please reopen if issue still there.

chensuyue pushed a commit that referenced this issue Oct 17, 2022
VincyZhang pushed a commit that referenced this issue Feb 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants