"Invalid argument: Index out of range using input dim 0; input has only 0 dims" during ssd300 model training #375

jessicametzger · 2021-02-04T06:36:53Z

I am using ssd_keras with tensorflow 1.15 backend (I was originally using tensorflow 2.20 but ran into this issue) and it throws an InvalidArgumentError the moment I start the training. It's very deep in the tensorflow backend and almost impossible to trace.

Full stack trace

As soon as I call model.fit(...) in the ssd300_training.ipynb tutorial, I get the following very long message:

Epoch 00001: LearningRateScheduler reducing learning rate to 0.001.
Epoch 1/120

---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
<ipython-input-9-1326232784a4> in <module>
     10                               validation_data=val_generator,
     11                               validation_steps=ceil(val_dataset_size/batch_size),
---> 12                               initial_epoch=initial_epoch)

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    725         max_queue_size=max_queue_size,
    726         workers=workers,
--> 727         use_multiprocessing=use_multiprocessing)
    728 
    729   def evaluate(self,

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_generator.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing)
    601         shuffle=shuffle,
    602         initial_epoch=initial_epoch,
--> 603         steps_name='steps_per_epoch')
    604 
    605   def evaluate(self,

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_generator.py in model_iteration(model, data, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch, mode, batch_size, steps_name, **kwargs)
    263 
    264       is_deferred = not model._is_compiled
--> 265       batch_outs = batch_function(*batch_data)
    266       if not isinstance(batch_outs, list):
    267         batch_outs = [batch_outs]

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
   1015       self._update_sample_weight_modes(sample_weights=sample_weights)
   1016       self._make_train_function()
-> 1017       outputs = self.train_function(ins)  # pylint: disable=not-callable
   1018 
   1019     if reset_metrics:

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py in __call__(self, inputs)
   3474 
   3475     fetched = self._callable_fn(*array_vals,
-> 3476                                 run_metadata=self.run_metadata)
   3477     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3478     output_structure = nest.pack_sequence_as(

~/anaconda3/envs/tf1gpu/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in __call__(self, *args, **kwargs)
   1470         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1471                                                self._handle, args,
-> 1472                                                run_metadata_ptr)
   1473         if run_metadata:
   1474           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

InternalError: 2 root error(s) found.
  (0) Internal: Dst tensor is not initialized.
	 [[{{node loss/conv7_2/kernel/Regularizer/Square/ReadVariableOp}}]]
	 [[training/SGD/gradients/gradients/conv1_1/BiasAdd_grad/BiasAddGrad/_545]]
  (1) Internal: Dst tensor is not initialized.
	 [[{{node loss/conv7_2/kernel/Regularizer/Square/ReadVariableOp}}]]
0 successful operations.
0 derived errors ignored.

System info

Ubuntu 20.04.1
tensorflow-gpu 1.15
keras: using tf.keras through tensorflow-gpu. I have converted to tf.keras in a standard way.
Which commit: latest
GPU: NVIDIA Corporation TU104GL [Quadro RTX 5000] (the error happens whether I am running on gpu or cpu)

Reproducible example

The error happens whenever I call model.fit(...) or model.fit_generator(...), where model is an ssd300 model, and where the backend is tf1. It happens whether I am using cpu or gpu. E.g. when I run the ssd300_training.ipynb tutorial, I get that error.

Sorry to open two issues at once. I've been trying to work through both of these for awhile but have found no solutions.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Invalid argument: Index out of range using input dim 0; input has only 0 dims" during ssd300 model training #375

"Invalid argument: Index out of range using input dim 0; input has only 0 dims" during ssd300 model training #375

jessicametzger commented Feb 4, 2021

"Invalid argument: Index out of range using input dim 0; input has only 0 dims" during ssd300 model training #375

"Invalid argument: Index out of range using input dim 0; input has only 0 dims" during ssd300 model training #375

Comments

jessicametzger commented Feb 4, 2021

Full stack trace

System info

Reproducible example