Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with muligpu #118

Open
nick-torenvliet opened this issue Oct 8, 2020 · 0 comments
Open

Problems with muligpu #118

nick-torenvliet opened this issue Oct 8, 2020 · 0 comments

Comments

@nick-torenvliet
Copy link

I'm running the code just fine on the cpu. So my docker container is working.
When running multi gpu, I needed write in a pass of batch_size at line 120 of capsulenet-multi-gpu.py - because there was an error due to lack of passing there.

Now when I run the code...

python capsulenet-multi-gpu.py --gpus 4 --batch_size 300

I get warnings such as:
200/200 [==============================] - ETA: 0s - loss: 0.8408 - capsnet_loss: 0.8094 - decoder_loss: 0.0801WARNING:tensorflow:Model was constructed with shape (300, 28, 28, 1) for input Tensor("input_1:0", shape=(300, 28, 28, 1), dtype=float32), but it was called on an input with incompatible shape (75, 28, 28, 1).
WARNING:tensorflow:Model was constructed with shape (300, 28, 28, 1) for input Tensor("input_1:0", shape=(300, 28, 28, 1), dtype=float32), but it was called on an input with incompatible shape (75, 28, 28, 1).
WARNING:tensorflow:Model was constructed with shape (300, 28, 28, 1) for input Tensor("input_1:0", shape=(300, 28, 28, 1), dtype=float32), but it was called on an input with incompatible shape (75, 28, 28, 1).
WARNING:tensorflow:Model was constructed with shape (300, 28, 28, 1) for input Tensor("input_1:0", shape=(300, 28, 28, 1), dtype=float32), but it was called on an input with incompatible shape (75, 28, 28, 1).

And a final error:
Traceback (most recent call last):
File "capsulenet-multi-gpu.py", line 131, in
train(model=multi_model, data=((x_train, y_train), (x_test, y_test)), args=args)
File "capsulenet-multi-gpu.py", line 67, in train
callbacks=[log, tb, lr_decay])
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1479, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
return method(self, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 872, in fit
return_dict=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
return method(self, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1081, in evaluate
tmp_logs = test_function(iterator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 580, in call
result = self._call(*args, **kwds)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 650, in _call
return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
self.captured_inputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598, in call
ctx=ctx)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Expected size[0] in [0, 32], but got 75
[[node model_3/lambda/Slice (defined at capsulenet-multi-gpu.py:67) ]]
[[model_3/model_2/digitcaps/map/while/LoopCond/_75/_132]]
(1) Invalid argument: Expected size[0] in [0, 32], but got 75
[[node model_3/lambda/Slice (defined at capsulenet-multi-gpu.py:67) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_test_function_12079]

Function call stack:
test_function -> test_function

2020-10-08 15:37:53.123350: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]

Is there a quick fix for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant