Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tb shows blank screen instead of report. Console inspection = failed to load resource: net::ERR_CONTENT_LENGTH_MISMATCH #6789

Open
lessw2020 opened this issue Mar 15, 2024 · 2 comments

Comments

@lessw2020
Copy link

Hi - We're trying to consolidate on using tensorboard but sporadically hitting an issue where reports won't load and instead just get a blank white screen.
This occurs on multiple machines, and at least where run, diagnose_tensorboard does not report any issues (full details below).
Have upgraded and running latest tensorboard (2.16.2), and the tb data is all there (verified using --inspect option).
Have tried a whole bunch of things (uninstall/reinstall, restart, chrome vs safari, --bind_all, etc.) and no luck.

Most promising lead so far is inspecting things in chrome reveals:
failed to load resource: net::ERR_CONTENT_LENGTH_MISMATCH

Is there any input or advise here on how to further debug/resolve? (chrome screenshot of error below).
Thanks in advance!

Screenshot 2024-03-15 at 8 51 19 AM

Results of diagnose_tensorboard below:
python diagnose_tensorboard.py

Diagnostics

Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version df7af2c6fc0e4c4a5b47aeae078bc7ad95777ffa

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=9, micro=12, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='submit-1', release='5.15.0-1048-aws', version='#53~20.04.1-Ubuntu SMP Wed Oct 4 16:44:20 UTC 2023', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: True
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tensorboard==2.16.2
WARNING: no installation among: ['tensorflow', 'tensorflow-gpu', 'tf-nightly', 'tf-nightly-2.0-preview', 'tf-nightly-gpu', 'tf-nightly-gpu-2.0-preview']
WARNING: no installation among: ['tensorflow-estimator', 'tensorflow-estimator-2.0-preview', 'tf-estimator-nightly']
INFO: installed: tensorboard-data-server==0.7.2

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.16.2'

--- check: tensorflow_python_version
Traceback (most recent call last):
  File "/opt/hpcaas/.mounts/fs-5c62ddab/home/less/torchtrain/outputs/tb/diagnose_tensorboard.py", line 511, in main
    suggestions.extend(check())
  File "/opt/hpcaas/.mounts/fs-5c62ddab/home/less/torchtrain/outputs/tb/diagnose_tensorboard.py", line 81, in wrapper
    result = fn()
  File "/opt/hpcaas/.mounts/fs-5c62ddab/home/less/torchtrain/outputs/tb/diagnose_tensorboard.py", line 267, in tensorflow_python_version
    import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'

--- check: tensorboard_data_server_version
INFO: data server binary: '/data/home/less/miniconda3/lib/python3.9/site-packages/tensorboard_data_server/bin/server'
INFO: data server binary version: b'rustboard 0.7.2'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/data/home/less/miniconda3/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'submit-1.pytorch.hpcaas'

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=9986583, st_dev=66305, st_nlink=2, st_uid=2232, st_gid=2232, st_size=4096, st_atime=1709866713, st_mtime=1710516966, st_ctime=1710516966)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/data/home/less/miniconda3/lib/python3.9/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==2.1.0
accelerate==0.22.0
addict==2.4.0
-e git+https://github.com/lessw2020/OLMo.git@8877cf4c1522b51a1257b2462fc05942270f4f7c#egg=ai2_olmo
aiohttp==3.9.1
aiosignal==1.3.1
alabaster==0.7.13
alembic==1.13.0
antlr4-python3-runtime==4.9.3
anyascii==0.3.2
anyio==3.7.1
appdirs==1.4.4
args==0.1.0
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1670263926556/work
async-timeout==4.0.3
attrs==23.1.0
auditnlg==0.0.1
auto-gptq @ file:///opt/hpcaas/.mounts/fs-5c62ddab/home/less/AutoGPTQ_Triton
Automat==22.10.0
Babel==2.13.0
backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work
backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1687772187254/work
beaker-gantry==0.21.0
beaker-py==1.25.0
beartype==0.15.0
beautifulsoup4==4.12.2
bitsandbytes==0.39.1
black==23.12.1
blinker==1.7.0
blis==0.7.11
blobfile==2.0.2
boltons==23.1.1
boto3==1.34.48
botocore==1.34.48
Brotli==1.0.9
brotlipy==0.7.0
build==1.0.3
buildtools==1.0.6
cached_path==1.6.0
cachetools==5.3.1
catalogue==2.0.10
causal-conv1d==1.1.1
certifi==2023.11.17
cffi @ file:///opt/conda/conda-bld/cffi_1642701102775/work
cfgv==3.4.0
charset-normalizer==3.3.2
click==8.1.7
click-help-colors==0.9.4
clint==0.5.1
cloudpathlib==0.15.1
cloudpickle==3.0.0
cmake==3.27.6
colorama @ file:///tmp/build/80754af9/colorama_1607707115595/work
coloredlogs==15.0.1
CoLT5-attention==0.10.15
conda==4.14.0
conda-content-trust @ file:///tmp/build/80754af9/conda-content-trust_1617045594566/work
conda-package-handling @ file:///tmp/build/80754af9/conda-package-handling_1649105784853/work
confection==0.1.3
constantly==23.10.4
contourpy==1.2.0
contractions==0.1.73
coverage==7.3.1
cryptography @ file:///tmp/build/80754af9/cryptography_1639414572950/work
cycler==0.12.1
cymem==2.0.8
dadaptation==3.1
DALL-E==0.1
databricks-cli==0.18.0
dataclasses-json==0.6.1
datasets==2.18.0
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
detoxify==0.5.0
dill==0.3.7
distlib==0.3.7
docformatter==1.5.1
docker==6.1.3
docker-pycreds==0.4.0
docopt==0.6.2
docutils==0.15.2
edlib==1.3.9
einops==0.7.0
emoji==2.8.0
entrypoints @ file:///home/conda/feedstock_root/build_artifacts/entrypoints_1643888246732/work
exceptiongroup==1.1.3
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1667317341051/work
face==20.1.1
fairscale==0.4.13
fastapi==0.103.2
fastjsonschema==2.19.0
filelock==3.13.1
fire==0.5.0
flake8==4.0.1
flake8-bugbear==22.4.25
flake8-polyfill==1.0.2
Flask==3.0.0
fonttools==4.46.0
frozenlist==1.4.0
fsspec==2023.10.0
ftfy==6.1.3
furl==2.1.3
fuzzywuzzy==0.18.0
gdown==4.7.1
gekko==1.0.6
gitdb==4.0.11
gitdb2==4.0.2
GitPython==3.1.40
glom==23.5.0
google-api-core==2.12.0
google-api-python-client==2.102.0
google-auth==2.23.2
google-auth-httplib2==0.1.1
google-cloud-core==2.3.3
google-cloud-storage==2.11.0
google-crc32c==1.5.0
google-resumable-media==2.6.0
googleapis-common-protos==1.60.0
greenlet==3.0.2
grpcio==1.60.1
gunicorn==21.2.0
h11==0.14.0
httplib2==0.22.0
httptools==0.6.0
huggingface-hub==0.19.4
humanfriendly==10.0
hydra-core==1.3.2
hyperlink==21.0.0
identify==2.5.29
idna==3.6
imagesize==1.4.1
importlib-metadata==7.0.0
importlib-resources==6.0.1
incremental==22.10.0
inflate64==0.3.1
iniconfig==2.0.0
inquirerpy==0.3.4
iopath==0.1.10
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1620912942381/work/dist/ipykernel-5.5.5-py3-none-any.whl
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1685727741709/work
ipython-genutils==0.2.0
isort==5.12.0
itsdangerous==2.1.2
jaraco.classes==3.3.1
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1690896916983/work
jeepney==0.8.0
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.3.2
json5==0.9.14
jsonlines==4.0.0
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.17.3
jupyter-client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1633454794268/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1686775603087/work
keyring==24.3.0
kiwisolver==1.4.5
langchain==0.0.309
langcodes==3.3.0
langsmith==0.0.42
libcst==1.0.1
lightning-utilities==0.10.1
lit==17.0.2
-e git+https://github.com/lessw2020/llama-recipes.git@38df368a70c292db6e42e0f275139678497fd896#egg=llama_recipes
local-attention==1.8.6
loralib==0.1.2
-e git+https://github.com/thu-ml/low-bit-optimizers.git@e3e2854728e498c2a606e3fdb88daa27ae94f9a6#egg=lpmm
lxml==4.9.3
Mako==1.3.0
mamba==0.11.3
Markdown==3.5.1
markdown-it-py==3.0.0
MarkupSafe==2.1.3
marshmallow==3.20.1
matplotlib==3.8.2
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1660814786464/work
mccabe==0.6.1
mdurl==0.1.2
mlflow==2.9.1
mmcv==1.3.8
mmsegmentation==0.14.1
mock==5.1.0
more-itertools==10.2.0
moreorless==0.4.0
mpmath==1.3.0
msgpack==1.0.7
msgspec==0.18.6
multidict==6.0.4
multiprocess==0.70.15
multivolumefile==0.2.3
murmurhash==1.0.10
mypy==1.0.1
mypy-extensions==1.0.0
myst-parser==0.12.10
necessary==0.4.3
nest-asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1664684991461/work
networkx==3.2.1
nh3==0.2.15
ninja==1.11.1.1
nltk==3.8.1
nodeenv==1.8.0
numpy==1.26.2
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.2.10.91
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.4.91
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu11==2.14.3
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.1.105
nvidia-nvtx-cu11==11.7.91
nvidia-nvtx-cu12==12.1.105
oauthlib==3.2.2
omegaconf==2.3.0
openai==0.28.1
opencv-python==4.8.1.78
optimum==1.12.0
orderedmultidict==1.0.1
packaging==23.2
pandas==2.1.3
parlai==1.7.0
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work
pathspec==0.11.2
pathtools==0.1.2
pathy==0.10.2
peft @ git+https://github.com/huggingface/peft.git@85013987aa82aa1af3da1236b6902556ce3e483e
pep8-naming==0.12.1
petname==2.6
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1667297516076/work
pfzy==0.3.4
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
Pillow==10.1.0
pip==24.0
pkginfo==1.9.6
platformdirs==4.1.0
pluggy==1.3.0
portalocker==2.8.2
pre-commit==3.4.0
preshed==3.0.9
prettytable==3.9.0
prometheus-client==0.19.0
prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1688565951714/work
protobuf==4.25.1
psutil==5.9.5
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work
py-gfm==2.0.0
py-rouge==1.1
py7zr==0.20.6
pyahocorasick==2.0.0
pyarrow==14.0.1
pyarrow-hotfix==0.6
pyasn1==0.5.0
pyasn1-modules==0.3.0
pybcj==1.0.1
pycodestyle==2.8.0
pycosat==0.6.3
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pycryptodomex==3.18.0
pydantic==1.10.13
pydot==1.4.2
pyflakes==2.4.0
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1691408637400/work
PyJWT==2.8.0
pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work
pyparsing==3.1.1
pyppmd==1.0.0
pyproject_hooks==1.0.0
pyrsistent==0.19.3
PySocks @ file:///tmp/build/80754af9/pysocks_1605305812635/work
pytest==8.0.1
pytest-cov==4.1.0
pytest-datadir==1.5.0
pytest-mock==3.8.2
pytest-regressions==2.5.0
pytest-sphinx==0.6.0
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work
python-dotenv==1.0.0
python-hostlist==1.23.0
python-json-logger==2.0.7
pytorch-triton==3.0.0+a9bc1a3647
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.1
pyzstd==0.15.9
querystring-parser==1.2.4
rank-bm25==0.2.2
ray==2.7.0
readme-renderer==42.0
redo==2.0.4
regex==2023.10.3
requests==2.31.0
requests-mock==1.11.0
requests-toolbelt==1.0.0
requirements-parser==0.5.0
responses==0.18.0
rfc3339-validator==0.1.4
rfc3986==2.0.0
rfc3986-validator==0.1.1
rich==13.7.0
rouge==1.0.1
rpds-py==0.13.1
rsa==4.9
ruamel-yaml-conda @ file:///tmp/build/80754af9/ruamel_yaml_1616016711199/work
ruff==0.1.7
s3transfer==0.10.0
safetensors==0.4.1
scikit-learn==1.3.2
scipy==1.11.4
SecretStorage==3.3.3
Send2Trash==1.8.2
sentencepiece==0.1.99
sentry-sdk==1.32.0
setproctitle==1.3.3
setuptools==69.0.2
sh==2.0.6
simplejson==3.19.2
six @ file:///tmp/build/80754af9/six_1644875935023/work
smart-open==6.4.0
smashed==0.21.5
smmap==5.0.1
sniffio==1.3.0
snowballstemmer==2.2.0
soupsieve==2.5
spacy==3.7.1
spacy-legacy==3.0.12
spacy-loggers==1.0.5
Sphinx==2.2.2
sphinx-autodoc-typehints==1.10.3
sphinx-rtd-theme==1.3.0
sphinxcontrib-applehelp==1.0.4
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
SQLAlchemy==2.0.23
sqlparse==0.4.4
srsly==2.4.8
st-moe-pytorch @ file:///data/home/less/st-moe
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
starlette==0.27.0
stdlibs==2022.10.9
subword-nmt==0.3.8
sympy==1.12
tabulate==0.9.0
tenacity==8.2.3
tensorboard==2.16.2
tensorboard-data-server==0.7.2
tensorboardX==2.6.2.2
termcolor==2.3.0
terminado==0.18.0
textsearch==0.0.24
texttable==1.6.7
tf-keras==2.15.0
thinc==8.2.1
threadpoolctl==3.2.0
tiktoken==0.5.2
timm==0.4.12
tinycss2==1.2.1
tokenize-rt==5.2.0
tokenizers==0.15.0
toml==0.10.2
tomli==2.0.1
tomlkit==0.12.1
toolz @ file:///home/conda/feedstock_root/build_artifacts/toolz_1657485559105/work
torch==2.3.0.dev20240307+cu121
torchaudio==2.2.0.dev20240307+cu121
torchdata==0.6.1
torchinfo==1.8.0
torchmetrics==1.3.1
-e git+https://github.com/lessw2020/tau_graph.git@06b9eeddb971d39078b5774c2be8b67181562ca6#egg=torchpippy
torchtext==0.15.2
torchvision==0.18.0.dev20240307+cu121
tornado==6.3.3
tqdm==4.66.1
trailrunner==1.4.0
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1675110562325/work
transformers==4.37.2
triton==2.1.0
triton-nightly==2.1.0.post20231218080902
trouting==0.3.3
twine==5.0.0
Twisted==23.10.0
typer==0.9.0
types-python-dateutil==2.8.19.14
types-setuptools==69.1.0.20240223
typing-inspect==0.9.0
typing_extensions==4.8.0
tzdata==2023.3
ufmt==1.3.0
Unidecode==1.3.7
untokenize==0.1.1
uri-template==1.3.0
uritemplate==4.1.1
urllib3==1.26.18
usort==1.0.2
uvicorn==0.23.2
uvloop==0.17.0
virtualenv==20.24.5
vit-pytorch==1.4.1
vllm==0.2.0
wandb==0.16.3
wasabi==1.1.2
watchfiles==0.20.0
wcwidth==0.2.12
weasel==0.3.2
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
websockets==11.0.3
Werkzeug==3.0.1
wheel==0.37.1
x-transformers==1.18.2
xformers==0.0.22
xxhash==3.4.1
yacs==0.1.8
yapf==0.40.2
yarl==1.9.4
zipp==3.17.0
zope.interface==6.1

Next steps

No action items identified. Please copy ALL of the above output,
including the lines containing only backticks, into your GitHub issue
or comment. Be sure to redact any sensitive information.

@arcra
Copy link
Member

arcra commented Mar 27, 2024

Hi, do you get just a blank page entirely in chrome, similar to what you see when you go to http://about:blank ? Does a refresh not solve it? Does the console show any error messages or warnings?

When this happens, can you clear cache on your browser and try again? (I've been bitten by this cache issue with other apps, perhaps not with this particular error, but I read somewhere that this could be a reason to get this error, and when it happened to me, I was pleasantly surprised to learn that just clearing the cache made it work).

I've also experienced slowness from TB reading the data occasionally, if there's too much data to load. It might just take some time, but I'd try the cache thing first.

Apart from that, I found this stackoverflow post where they claim that if you're accessing the app from a different device, it might be an issue with the "angular app" being too big, and having slow connection. (Likely not the case here, but I thought I'd mention it just in case it was relevant).

Please try clearing the cache and let us know if that solves the issue.

@onevfall
Copy link

onevfall commented May 9, 2024

I have the problem too, how did you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants