Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigTable: 'Table.mutate_rows' deadline exceeded for large mutations #7

Closed
cwbeitel opened this issue May 3, 2019 · 5 comments · Fixed by #157
Closed

BigTable: 'Table.mutate_rows' deadline exceeded for large mutations #7

cwbeitel opened this issue May 3, 2019 · 5 comments · Fixed by #157
Assignees
Labels
api: bigtable Issues related to the googleapis/python-bigtable API. type: docs Improvement to the documentation for an API.

Comments

@cwbeitel
Copy link

cwbeitel commented May 3, 2019

Deadline exceeded for large mutation of BigQuery table; not necessarily a bug perhaps it would be helpful to other to add this to the docs, automatically batch mutations by size, make the timeout configurable, or document large data transfers as a potential cause of exceeding the deadline.

Environment details

  • Cloud BigTable
  • Ubuntu 18.04
  • Python 3.6.4
  • google-cloud-bigtable==0.31.1

Steps to reproduce and code example

Changing the value of mutation_batch_size in the following, too large and the deadline is exceeded; with tfexample serialized video examples with 4 frames of size 224x224. Not necessarily a bug if this is the expected behavior and users should handle this kind of batching themselves.

def iterable_dataset_from_file(filename):
    dataset = tf.data.TFRecordDataset(filename)

    iterator = dataset.make_initializable_iterator()

    next_element = iterator.get_next()

    with tf.Session() as sess:

      sess.run(iterator.initializer)

      i = 0
      while True:
        try:
          if i % 1000 == 0:
            print("Processed %s examples..." % i)
          yield sess.run(next_element)
          i += 1
        except tf.errors.OutOfRangeError:
          print("Ran out of examples (processed %s), exiting..." % i)
          break


def tfrecord_files_to_cbt_table(glob, table, selection, max_records=100000000,
                                mutation_batch_size=250):

  mutation_index = 0

  def new_mutation_batch():
    return [None for _ in range(mutation_batch_size)]

  files = tf.gfile.Glob(glob)

  for file_path in files:
 
    row_mutation_batch = new_mutation_batch()

    for i, example in enumerate(iterable_dataset_from_file(file_path)):

      idx = hashlib.md5(example).hexdigest()
    
      # DEV: To check "shuffle" effect add the id suffix
      idx = "_".join([selection.prefix, idx, str(i)])

      row = table.row(idx)
      row.set_cell(column_family_id=selection.column_family,
                   column=selection.column_qualifier,
                   value=example,
                   timestamp=datetime.datetime.utcnow())

      row_mutation_batch[mutation_index] = row

      if mutation_index == (mutation_batch_size - 1):
        table.mutate_rows(row_mutation_batch)
        row_mutation_batch = new_mutation_batch()
        mutation_index = 0
      else:
        mutation_index += 1

    final_mutation = row_mutation_batch[:(mutation_index-1)]
    if final_mutation:
      table.mutate_rows(final_mutation)

Stack trace

Traceback (most recent call last):
  File "/home/jovyan/.local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 79, in next
    return six.next(self._wrapped)
  File "/opt/conda/lib/python3.6/site-packages/grpc/_channel.py", line 341, in __next__
    return self._next()
  File "/opt/conda/lib/python3.6/site-packages/grpc/_channel.py", line 335, in _next
    raise self
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded)>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jovyan/work/pcml/pcml/operations/tfrecord2bigtable.py", line 330, in <module>
    tf.app.run()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/jovyan/work/pcml/pcml/operations/tfrecord2bigtable.py", line 300, in main
    max_records=FLAGS.max_records)
  File "/home/jovyan/work/pcml/pcml/operations/tfrecord2bigtable.py", line 228, in tfrecord_files_to_cbt_table
    table.mutate_rows(row_mutation_batch)
  File "/home/jovyan/.local/lib/python3.6/site-packages/google/cloud/bigtable/table.py", line 423, in mutate_rows
    return retryable_mutate_rows(retry=retry)
  File "/home/jovyan/.local/lib/python3.6/site-packages/google/cloud/bigtable/table.py", line 571, in __call__
    mutate_rows()
  File "/home/jovyan/.local/lib/python3.6/site-packages/google/api_core/retry.py", line 270, in retry_wrapped_func
    on_error=on_error,
  File "/home/jovyan/.local/lib/python3.6/site-packages/google/api_core/retry.py", line 179, in retry_target
    return target()
  File "/home/jovyan/.local/lib/python3.6/site-packages/google/cloud/bigtable/table.py", line 634, in _do_mutate_retryable_rows
    for response in responses:
  File "/home/jovyan/.local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 81, in next
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.DeadlineExceeded: 504 Deadline Exceeded
@tseaver tseaver changed the title BigTable table.mutate_rows deadline exceeded for large mutations BigTable: 'Table.mutate_rows' deadline exceeded for large mutations May 7, 2019
@sduskis
Copy link
Contributor

sduskis commented May 10, 2019

@crwilcox, @kolea2, @igorbernstein2: this is a problem in the GAPIC config. Currently, the default timeout is set to 60 seconds. Someone may need to increase the timeout for MutateRows calls.

@cwbeitel
Copy link
Author

@sduskis Sending data in smaller batches might be a best practice anyway if a part of a large transfer failing could fail the whole transfer (vs. perhaps retries only the failed portion?).

@sduskis
Copy link
Contributor

sduskis commented May 10, 2019

@cwbeitel, I agree on best practices. In theory, you can use a MutationsBatcher From table.mutations_batcher() which ought to encapsulate the best practices.

I also believe that we ought to review the default timeout value to make sure that it makes sense for this type of RPC.

@crwilcox crwilcox transferred this issue from googleapis/google-cloud-python Jan 31, 2020
@product-auto-label product-auto-label bot added the api: bigtable Issues related to the googleapis/python-bigtable API. label Jan 31, 2020
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 3, 2020
@crwilcox crwilcox added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: docs Improvement to the documentation for an API. labels Feb 4, 2020
@frankyn frankyn removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 4, 2020
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Feb 4, 2020
@kolea2 kolea2 removed 🚨 This issue needs some love. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Feb 6, 2020
@tseaver tseaver self-assigned this Oct 23, 2020
@tseaver
Copy link
Contributor

tseaver commented Oct 23, 2020

Relevant commits:

  • c38888d bumped the default timeout for the MutateRows RPC to 600 seconds, which seems like it should be sufficient.
  • 3169f10 added a mutation_timeout attribute to Table, and passes it through to the underlying GAPIC method. That value (default None) is passed through to the _RetryableMutateRowsWorker instance used to perform the actual MutateRows RPC call.

An odd side effect of how _RetryableMutateRowsWorker uses the timeout is that it scribbles on the mutate_rows key of the inner_api_calls of the table data client, IFF the wrapper is not already present. So, the first table will have its mutation_timeout value set on the client, but subsequent tables will not. :( A better implementation would probably be just to create the wrapper on the fly each time, if the timeout is not None. Or, even better, just pass through the timeout to the data_client.mutate_rows call, rather than mucking about with wrappers.

@cwbeitel
Copy link
Author

@tseaver Ah well done, the ability to configure a timeout seems like a good feature.

tseaver added a commit that referenced this issue Oct 23, 2020
Do *not* scribble on its internal API wrappers.

See:
#7 (comment)
tseaver added a commit that referenced this issue Oct 23, 2020
tseaver added a commit that referenced this issue Nov 12, 2020
Also, call data client's 'mutate_rows' directly -- do *not* scribble on its internal API wrappers.

See:
#7 (comment)

Closes #7
gcf-merge-on-green bot pushed a commit that referenced this issue Nov 16, 2020
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [google-cloud-bigtable](https://togithub.com/googleapis/python-bigtable) | minor | `==1.5.1` -> `==1.6.0` |

---

### Release Notes

<details>
<summary>googleapis/python-bigtable</summary>

### [`v1.6.0`](https://togithub.com/googleapis/python-bigtable/blob/master/CHANGELOG.md#&#8203;160-httpswwwgithubcomgoogleapispython-bigtablecomparev151v160-2020-11-16)

[Compare Source](https://togithub.com/googleapis/python-bigtable/compare/v1.5.1...v1.6.0)

##### Features

-   add 'timeout' arg to 'Table.mutate_rows' ([#&#8203;157](https://www.github.com/googleapis/python-bigtable/issues/157)) ([6d597a1](https://www.github.com/googleapis/python-bigtable/commit/6d597a1e5be05c993c9f86beca4c1486342caf94)), closes [/github.com//issues/7#issuecomment-715538708](https://www.github.com/googleapis//github.com/googleapis/python-bigtable/issues/7/issues/issuecomment-715538708) [#&#8203;7](https://www.github.com/googleapis/python-bigtable/issues/7)
-   Backup Level IAM ([#&#8203;160](https://www.github.com/googleapis/python-bigtable/issues/160)) ([44932cb](https://www.github.com/googleapis/python-bigtable/commit/44932cb8710e12279dbd4e9271577f8bee238980))

##### [1.5.1](https://www.github.com/googleapis/python-bigtable/compare/v1.5.0...v1.5.1) (2020-10-06)

##### Bug Fixes

-   harden version data gathering against DistributionNotFound ([#&#8203;150](https://www.github.com/googleapis/python-bigtable/issues/150)) ([c815421](https://www.github.com/googleapis/python-bigtable/commit/c815421422f1c845983e174651a5292767cfe2e7))

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/python-bigtable).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/python-bigtable API. type: docs Improvement to the documentation for an API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants