Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

botocore.errorfactory.NoSuchKey when old TF Events got deleted #6713

Open
shaowei-su opened this issue Jan 2, 2024 · 3 comments
Open

botocore.errorfactory.NoSuchKey when old TF Events got deleted #6713

shaowei-su opened this issue Jan 2, 2024 · 3 comments
Labels
core:notf Things related to No TensorFlow mode.

Comments

@shaowei-su
Copy link

Consider Stack Overflow for getting support using TensorBoard—they have
a larger community with better searchability:

https://stackoverflow.com/questions/tagged/tensorboard

Do not use this template for for setup, installation, or configuration
issues. Instead, use the “installation problem” issue template:

https://github.com/tensorflow/tensorboard/issues/new?template=installation_problem.md

To report a problem with TensorBoard itself, please fill out the
remainder of this template.

Environment information (required)

Please run diagnose_tensorboard.py (link below) in the same
environment from which you normally run TensorFlow/TensorBoard, and
paste the output here:

tensorboard==2.9.1

https://raw.githubusercontent.com/tensorflow/tensorboard/master/tensorboard/tools/diagnose_tensorboard.py

For browser-related issues, please additionally specify:

  • Browser type and version (e.g., Chrome 64.0.3282.140):
  • Screenshot, if it’s a visual issue:

Issue description

Please describe the bug as clearly as possible. How can we reproduce the
problem without additional resources (including external data files and
proprietary Python modules)?

When use Tensorboard to read TFEvents from S3, the deleted TFEvents from the same logdir will trigger event_file_loader exceptions as following:

Exception in thread Reloader 15:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/plugin_event_multiplexer.py", line 239, in Worker
    accumulator.Reload()
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/plugin_event_accumulator.py", line 183, in Reload
    for event in self._generator.Load():
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/directory_watcher.py", line 88, in Load
    for event in self._LoadInternal():
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/directory_watcher.py", line 118, in _LoadInternal
    for event in self._loader.Load():
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/event_file_loader.py", line 270, in Load
    for event in super(EventFileLoader, self).Load():
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/event_file_loader.py", line 244, in Load
    for record in super(LegacyEventFileLoader, self).Load():
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/event_file_loader.py", line 178, in Load
    yield next(self._iterator)
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/event_file_loader.py", line 109, in __next__
    self._reader.GetNext()
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/tensorflow_stub/pywrap_tensorflow.py", line 207, in GetNext
    header_str = self._read(8)
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/tensorflow_stub/pywrap_tensorflow.py", line 273, in _read
    new_data = self.file_handle.read(n)
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/tensorflow_stub/io/gfile.py", line 727, in read
    (self.buff, self.continuation_token) = self.fs.read(
  File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/tensorflow_stub/io/gfile.py", line 287, in read
    stream = s3.Object(bucket, path).get(**args)["Body"].read()
  File "/usr/local/lib/python3.10/dist-packages/boto3/resources/factory.py", line 520, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
  File "/usr/local/lib/python3.10/dist-packages/botocore/client.py", line 391, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.10/dist-packages/botocore/client.py", line 719, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.

This exception will block any new event been processed and similar issue is: #2634

@arcra
Copy link
Member

arcra commented Jan 4, 2024

To clarify, is the issue precisely the same as #2634? i.e. deleted events cause a crash instead of being ignored or handled gracefully somehow? But this is particularly how this issue manifests with the S2 filesystem?

Just to set expectations, support for S3 filesystem is best-effort, so I doubt we'll prioritize this, but I'll check with the team.

@arcra
Copy link
Member

arcra commented Jan 4, 2024

Ah, and can you clarify if this is also when TensorFlow is not installed, like in #2634? Does installing TensorFlow work around the issue?

@shaowei-su
Copy link
Author

To clarify, is the issue precisely the same as #2634? i.e. deleted events cause a crash instead of being ignored or handled gracefully somehow? But this is particularly how this issue manifests with the S2 filesystem?

Yes, this is the exact issue that also occur to S3 file system.

Ah, and can you clarify if this is also when TensorFlow is not installed, like in #2634? Does installing TensorFlow work around the issue?

No native TF installed in this case and TensorBoard is using the stub version for I/O operations. Let me try it out with compatible TF installed. Thanks for the suggestions!

@arcra arcra added the core:notf Things related to No TensorFlow mode. label Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core:notf Things related to No TensorFlow mode.
Projects
None yet
Development

No branches or pull requests

2 participants