New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No reloads when using S3 compatible storage #6712
Comments
Yes, sorry for using the wrong terminology, when I wrote "experiments" I was referring to "runs".
No, the only console logs are:
It does not seem like it, since I do not see any errors. |
No worries, I just wanted to make sure I understand what the issue is correctly. I believe it's (similarly to #6713) an issue with our "no-TF compatibility" implementation of the GFile interface, particularly the support for the S3 files. I believe a workaround might be to install tensorflow, so it would use the TF implementation. If you do, please confirm whether that solves the problem for you. Unfortunately, we don't have the bandwidth to investigate this with more detail. Our compat support for the S3 filesystem is done as best-effort. |
I tried to install tensorflow, but then I get the error:
My workaround is a bad one, I wrote a simple bash script that restarts tensorboard every minute, that way it reloads all runs every minute, which works for my use-case. |
Looks like there's a separate package that might provide support for that filesystem. Can you try installing Sources: |
Im addition to this issue, I posted a question in stackoverflow.
Environment information
Diagnostics
Diagnostics output
Issue description
Here is a repository with a full reproduction of my issue:
https://github.com/AlonKellner/s3-tensorboard-issue-reproduction
When using tensorboard with an s3 compatible storage, only the first experiments that the server comes across are shown in the UI.
All experiments that are present during startup are shown fully, if no experiment is present during start up, the first detected experiment will be shown partially.
After an experiment is detected and shown, no further steps and experiments will be reloaded and shown.
When using the
--reload_task process
option, no experiment is shown whatsoever.I have personally reproduced this unexpected behavior with both ceph (with an on-prem instance) and minio (with a local docker image, see reproduction repo).
The expected behavior is that any new experiment that is written to the s3 compatible storage should be reloaded in the UI when pressing the
reload
button, as well as new steps in that new experiment.Also, I expect this behavior to work correctly with the
--reload_multifile=true
option.Workarounds are also welcome, thanks :)
The text was updated successfully, but these errors were encountered: