Skip to content
This repository has been archived by the owner on Jul 5, 2021. It is now read-only.

cant load checkpoint files #225

Open
skywo1f opened this issue Oct 14, 2019 · 10 comments
Open

cant load checkpoint files #225

skywo1f opened this issue Oct 14, 2019 · 10 comments

Comments

@skywo1f
Copy link

skywo1f commented Oct 14, 2019

tried all of the files in the checkpoints folder:
model.ckpt.index
model.ckpt.meta
checkpoint
model.ckpt.data-00000-of-00001
none of them work:
Semantic-Segmentation-Suite/checkpoints/0295/model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator

@sweetboxwwy
Copy link

Have you solve this problem?

@AI-ML-Enthusiast
Copy link

@skywo1f @GeorgeSeif
same problem. Anyone suggest me please.
I trained on my PC , training is Ok but can not load the checkpoint file

@millermuttu
Copy link

same issue -

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key BatchNorm/beta not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_79 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_84_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

@FZY2019
Copy link

FZY2019 commented Dec 22, 2019

Is the problem solved?

@skywo1f
Copy link
Author

skywo1f commented Dec 24, 2019

no

@Imperssonator
Copy link

Had the same issue - in your case, you should just pass model.ckpt
https://stackoverflow.com/questions/33759623/tensorflow-how-to-save-restore-a-model
Also if you're not using CamVid, make sure to pass something to --dataset, otherwise it will default to the 32 class labels from the CamVid dataset. Hope that helps.

@wy9884255
Copy link

you better try this disk:/your_folder/model.ckpt

@Harikrishnan24
Copy link

Is this problem solved

@mtylerpreston
Copy link

mtylerpreston commented May 15, 2020

I think I have the solution...just use model.ckpt even if no such file exists.

I had the same struggle in trying to pass model.ckpt.meta etc in for resuming the train. Even though no file name exists in the directory where the checkpoint was specified to be saved, I just used model.ckpt and it worked out.

@Harikrishnan24
Copy link

Harikrishnan24 commented May 17, 2020 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants