Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint #1930

Open
919294AkshatSharma opened this issue Jun 27, 2023 · 0 comments

Comments

@919294AkshatSharma
Copy link

Description

Runtime error while training : t2t-trainer --generate_data --data_dir=/t2t_data --output_dir=/t2t_train/deque --problem=text2text_copyable_tokens --model=neural_deque_model --hparams_set=neural_deque --train_steps=100 --eval_steps=5

Environment information

OS: Ubuntu:18.04.5

$ pip freeze | grep tensor

mesh-tensorflow==0.1.21
tensor2tensor==1.15.7
tensorboard==1.15.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==1.15.0
tensorflow-addons==0.19.0
tensorflow-datasets==3.2.1
tensorflow-estimator==1.15.1
tensorflow-gan==2.1.0
tensorflow-hub==0.13.0
tensorflow-io-gcs-filesystem==0.32.0
tensorflow-metadata==1.12.0
tensorflow-probability==0.7.0
tensorstore==0.1.28

$ python -V
Python 3.7.12

For bugs: reproduction and error logs

# Steps to reproduce:
...
# Error logs:
Traceback (most recent call last):
  File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 35, in <module>
    tf.app.run(main)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/snoop/tracer.py", line 173, in simple_wrapper
    return function(*args, **kwargs)
  File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 30, in main
    t2t_trainer.main(argv)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 418, in main
    execute_schedule(exp)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 371, in execute_schedule
    getattr(exp, FLAGS.schedule)()
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/utils/trainer_lib.py", line 468, in continuous_train_and_eval
    self._eval_spec)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
    return executor.run()
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
    return self.run_local()
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
    saving_listeners=saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
    saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1495, in _train_with_estimator_spec
    any_step_done = True
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 861, in __exit__
    self._close_internal(exception_type)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 894, in _close_internal
    h.end(self._coordinated_creator.tf_sess)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 600, in end
    self._save(session, last_step)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 619, in _save
    if l.after_save(session, step):
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 519, in after_save
    self._evaluate(global_step_value)  # updates self.eval_result
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 544, in _evaluate
    'Eval status: {}'.format(self.eval_result.status))
RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant