Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading two checkpoints gives NotFoundError #268

Open
munikarmanish opened this issue Feb 15, 2019 · 1 comment
Open

Loading two checkpoints gives NotFoundError #268

munikarmanish opened this issue Feb 15, 2019 · 1 comment

Comments

@munikarmanish
Copy link

I tried loading two checkpoints as follows. However, loading only one (either) works fine.

In [1]:  from luminoth import Detector

In [2]:  model1 = Detector('checkpoint1')
2019-02-15 05:11:39.359988: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:tensorflow:Restoring parameters from /root/.luminoth/checkpoints/bab8dccb2202/model.ckpt-140716
INFO:tensorflow:Loaded checkpoint.

In [3]:  model2 = Detector('checkpoint2')
INFO:tensorflow:Restoring parameters from /root/.luminoth/checkpoints/7207a39c0441/model.ckpt-17296
2019-02-15 05:12:02.990421: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
Traceback (most recent call last):
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
	 [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1546, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
	 [[node save/RestoreV2 (defined at /root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py:61)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "<stdin>", line 1, in <module>
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/tasks.py", line 71, in __init__
    self._network = PredictorNetwork(config)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py", line 61, in __init__
    saver = tf.train.Saver(sharded=True, allow_empty=True)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1102, in __init__
    self.build()
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1114, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
    build_save=build_save, build_restore=build_restore)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 789, in _build_internal
    restore_sequentially, reshape)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 459, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
    restore_sequentially)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
	 [[node save/RestoreV2 (defined at /root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py:61)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1556, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1830, in object_graph_key_mapping
    checkpointable.OBJECT_GRAPH_PROTO_KEY)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 371, in get_tensor
    status)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/tasks.py", line 71, in __init__
    self._network = PredictorNetwork(config)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py", line 62, in __init__
    saver.restore(self.session, ckpt)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1562, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
	 [[node save/RestoreV2 (defined at /root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py:61)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "<stdin>", line 1, in <module>
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/tasks.py", line 71, in __init__
    self._network = PredictorNetwork(config)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py", line 61, in __init__
    saver = tf.train.Saver(sharded=True, allow_empty=True)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1102, in __init__
    self.build()
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1114, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
    build_save=build_save, build_restore=build_restore)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 789, in _build_internal
    restore_sequentially, reshape)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 459, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
    restore_sequentially)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
	 [[node save/RestoreV2 (defined at /root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py:61)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

System:

  • Ubuntu 16.04 64-bit with 8 GB RAM, 4 vCPUs (no GPU)
  • python version: 3.5.2
  • luminoth version: 0.2.3

Basically, I need to load multiple checkpoints in memory and use one of them based on a parameter provided by the user.

@dshea89
Copy link

dshea89 commented Apr 24, 2019

You need to use separate Tensorflow graphs when loading and using each model. If you are using Keras, you also need to use separate sessions. See:

import tensorflow as tf

graph1 = tf.Graph()
with graph1.as_default():
    session1 = tf.Session()
    with session1.as_default():
        model1 = Detector('checkpoint1')

graph2 = tf.Graph()
with graph2.as_default():
    session2 = tf.Session()
    with session2.as_default():
        model2 = Detector('checkpoint2')

with graph1.as_default():
    with session1.as_default():
        model1.predict(img)

with graph2.as_default():
    with session2.as_default():
        model2.predict(img)

Reference:
https://stackoverflow.com/a/51290092

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants