Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge_call called while defining a new graph or a tf.function. #882

Open
Jark5455 opened this issue Aug 23, 2023 · 0 comments
Open

merge_call called while defining a new graph or a tf.function. #882

Jark5455 opened this issue Aug 23, 2023 · 0 comments

Comments

@Jark5455
Copy link

Jark5455 commented Aug 23, 2023

Hello, I am currently trying to create a basic TD3 agent but I am getting this very long error

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/coordinator.py", line 293, in stop_on_exception
    yield
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 387, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/tf_agent.py", line 330, in train
    loss_info = self._train_fn(
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/utils/common.py", line 188, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 316, in _train
    tf.cond(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 311, in optimize_actor
    self._apply_gradients(actor_grads, trainable_actor_variables,
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 341, in _apply_gradients
    return optimizer.apply_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/optimizer.py", line 1229, in apply_gradients
    grads_and_vars = self.aggregate_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/optimizer.py", line 1191, in aggregate_gradients
    return optimizer_utils.all_reduce_sum_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/utils.py", line 42, in all_reduce_sum_gradients
    reduced = tf.distribute.get_replica_context().merge_call(
RuntimeError: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`. If you are subclassing a `tf.keras.Model`, please avoid decorating overridden methods `test_step` and `train_step` in `tf.function`.
INFO:tensorflow:Error reported to Coordinator: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`. If you are subclassing a `tf.keras.Model`, please avoid decorating overridden methods `test_step` and `train_step` in `tf.function`.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/coordinator.py", line 293, in stop_on_exception
    yield
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 387, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/tf_agent.py", line 330, in train
    loss_info = self._train_fn(
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/utils/common.py", line 188, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 316, in _train
    tf.cond(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 311, in optimize_actor
    self._apply_gradients(actor_grads, trainable_actor_variables,
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 341, in _apply_gradients
    return optimizer.apply_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/optimizer.py", line 1229, in apply_gradients
    grads_and_vars = self.aggregate_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/optimizer.py", line 1191, in aggregate_gradients
    return optimizer_utils.all_reduce_sum_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/utils.py", line 42, in all_reduce_sum_gradients
    reduced = tf.distribute.get_replica_context().merge_call(
RuntimeError: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`. If you are subclassing a `tf.keras.Model`, please avoid decorating overridden methods `test_step` and `train_step` in `tf.function`.

This error only occurs when I call tf_agents.train.utils.strategy_utils.get_strategy() with gpu=True

This error was triggered when I tried to instantiate an instance of tf_agents.train.learner.Learner. My tensorflow version is 2.13.0 and my tf-agents version is 0.17.0.

The error also appears to be occurring on tensorflow version 2.12.0 and tf-agents version 0.16.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant