Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Invalid argument: Index out of range using input dim 0; input has only 0 dims" during ssd300 model training #375

Open
jessicametzger opened this issue Feb 4, 2021 · 0 comments

Comments

@jessicametzger
Copy link

I am using ssd_keras with tensorflow 1.15 backend (I was originally using tensorflow 2.20 but ran into this issue) and it throws an InvalidArgumentError the moment I start the training. It's very deep in the tensorflow backend and almost impossible to trace.

Full stack trace

As soon as I call model.fit(...) in the ssd300_training.ipynb tutorial, I get the following very long message:

Epoch 00001: LearningRateScheduler reducing learning rate to 0.001.
Epoch 1/120

---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
<ipython-input-9-1326232784a4> in <module>
     10                               validation_data=val_generator,
     11                               validation_steps=ceil(val_dataset_size/batch_size),
---> 12                               initial_epoch=initial_epoch)

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    725         max_queue_size=max_queue_size,
    726         workers=workers,
--> 727         use_multiprocessing=use_multiprocessing)
    728 
    729   def evaluate(self,

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_generator.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing)
    601         shuffle=shuffle,
    602         initial_epoch=initial_epoch,
--> 603         steps_name='steps_per_epoch')
    604 
    605   def evaluate(self,

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_generator.py in model_iteration(model, data, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch, mode, batch_size, steps_name, **kwargs)
    263 
    264       is_deferred = not model._is_compiled
--> 265       batch_outs = batch_function(*batch_data)
    266       if not isinstance(batch_outs, list):
    267         batch_outs = [batch_outs]

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
   1015       self._update_sample_weight_modes(sample_weights=sample_weights)
   1016       self._make_train_function()
-> 1017       outputs = self.train_function(ins)  # pylint: disable=not-callable
   1018 
   1019     if reset_metrics:

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py in __call__(self, inputs)
   3474 
   3475     fetched = self._callable_fn(*array_vals,
-> 3476                                 run_metadata=self.run_metadata)
   3477     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3478     output_structure = nest.pack_sequence_as(

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in __call__(self, *args, **kwargs)
   1470         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1471                                                self._handle, args,
-> 1472                                                run_metadata_ptr)
   1473         if run_metadata:
   1474           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

InternalError: 2 root error(s) found.
  (0) Internal: Dst tensor is not initialized.
	 [[{{node loss/conv7_2/kernel/Regularizer/Square/ReadVariableOp}}]]
	 [[training/SGD/gradients/gradients/conv1_1/BiasAdd_grad/BiasAddGrad/_545]]
  (1) Internal: Dst tensor is not initialized.
	 [[{{node loss/conv7_2/kernel/Regularizer/Square/ReadVariableOp}}]]
0 successful operations.
0 derived errors ignored.

System info

  • Ubuntu 20.04.1
  • tensorflow-gpu 1.15
  • keras: using tf.keras through tensorflow-gpu. I have converted to tf.keras in a standard way.
  • Which commit: latest
  • GPU: NVIDIA Corporation TU104GL [Quadro RTX 5000] (the error happens whether I am running on gpu or cpu)

Reproducible example

The error happens whenever I call model.fit(...) or model.fit_generator(...), where model is an ssd300 model, and where the backend is tf1. It happens whether I am using cpu or gpu. E.g. when I run the ssd300_training.ipynb tutorial, I get that error.

Sorry to open two issues at once. I've been trying to work through both of these for awhile but have found no solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant