Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve transaction-level retry logic #879

Open
daniel-sanche opened this issue Mar 9, 2024 · 1 comment
Open

Improve transaction-level retry logic #879

daniel-sanche opened this issue Mar 9, 2024 · 1 comment
Assignees
Labels
api: firestore Issues related to the googleapis/python-firestore API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Milestone

Comments

@daniel-sanche
Copy link
Contributor

When a transaction fails, it will retry the entire transaction. But some bugs have been identified in the retry logic:

  • only checks for ABORTED. Should also retry on ANCELLED, UNKNOWN, DEADLINE_EXCEEDED, INTERNAL, UNAVAILABLE, UNAUTHENTICATED, RESOURCE_EXHAUSTED
  • backoff is not supported
  • transaction calls rRllback on failure, even if BeginTransaction was never called
  • the first transaction id should be used for each retry. Currently always uses latest retry id
  • RESOURCE_EXHAUSTED errors should jump to the max backoff value immediately
  • gapic-level retries should be limited when called in the context of a transaction (BeginTransaction, Commit, BatchGetDocuments, RunQuery, RunAggregationQuery, etc)
@daniel-sanche daniel-sanche added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Mar 9, 2024
@daniel-sanche daniel-sanche self-assigned this Mar 9, 2024
@product-auto-label product-auto-label bot added the api: firestore Issues related to the googleapis/python-firestore API. label Mar 9, 2024
@daniel-sanche daniel-sanche added this to the Retry Audit milestone Mar 9, 2024
@breathe
Copy link

breathe commented Mar 12, 2024

Is there any chance any of the known bugs this issue is intended to address could be cause for an exception like this?

Traceback (most recent call last):
  File "/app/.venv/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 72, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/app/.venv/lib/python3.10/site-packages/grpc/_channel.py", line 1161, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/app/.venv/lib/python3.10/site-packages/grpc/_channel.py", line 1004, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INVALID_ARGUMENT
	details = "The referenced transaction has expired or is no longer valid."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.152.95:443 {created_time:"2024-03-12T05:00:53.48562254+00:00", grpc_status:3, grpc_message:"The referenced transaction has expired or is no longer valid."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/zephyrus/cli.py", line 326, in analyze
    analysis.run()
  File "/app/zephyrus/lib/analysis.py", line 1520, in run
    update_enlighten_pbar(
  File "/app/zephyrus/lib/cloud/progress_bar.py", line 31, in update_enlighten_pbar
    _update_enlighten_pbar(
  File "/app/zephyrus/lib/cloud/progress_bar.py", line 109, in _update_enlighten_pbar
    update_task_output(
  File "/app/zephyrus/lib/cloud/firebase_transactions.py", line 388, in update_task_output
    raise e
  File "/app/zephyrus/lib/cloud/firebase_transactions.py", line 381, in update_task_output
    old_task_output_status, next_task_output_status = _update_task_output(
  File "/app/.venv/lib/python3.10/site-packages/google/cloud/firestore_v1/transaction.py", line 273, in __call__
    transaction._commit()
  File "/app/.venv/lib/python3.10/site-packages/google/cloud/firestore_v1/transaction.py", line 144, in _commit
    commit_response = _commit_with_retry(self._client, self._write_pbs, self._id)
  File "/app/.venv/lib/python3.10/site-packages/google/cloud/firestore_v1/transaction.py", line 339, in _commit_with_retry
    return client._firestore_api.commit(
  File "/app/.venv/lib/python3.10/site-packages/google/cloud/firestore_v1/services/firestore/client.py", line 1131, in commit
    response = rpc(
  File "/app/.venv/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py", line 113, in __call__
    return wrapped_func(*args, **kwargs)
  File "/app/.venv/lib/python3.10/site-packages/google/api_core/retry.py", line 349, in retry_wrapped_func
    return retry_target(
  File "/app/.venv/lib/python3.10/site-packages/google/api_core/retry.py", line 191, in retry_target
    return target()
  File "/app/.venv/lib/python3.10/site-packages/google/api_core/timeout.py", line 120, in func_with_timeout
    return func(*args, **kwargs)
  File "/app/.venv/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 74, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InvalidArgument: 400 The referenced transaction has expired or is no longer valid.

I run a somewhat complicated function inside of an @transaction wrapped method that is the core of a co-operative task system ... I'm getting the above error where its not expected -- conflict's should be relatively rare and not sure how the transaction execution could be blocked long enough for the transaction to expire (120s I believe is the default which I haven't changed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the googleapis/python-firestore API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants