Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory #563

Open
Jimmy-Yang1217 opened this issue Apr 27, 2024 · 1 comment
Labels
type/bug An issue about a bug

Comments

@Jimmy-Yang1217
Copy link

馃悰 Describe the bug

I am new to OLMo and I want to retrain(like finetune) several checkpoints provided by the csv from checkpoints/official.
``
However, I followed the instructions in readme and downloaded the checkpoint via the link, but the 'RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory' always throws out.

According to some solution to this kind of questions from Stackoverflow, they pointed out it might caused by the corrupted checkpoint file or wrong torch version.I changed different checkpoints and varied torch version from 2.0.0 to 2.3.0, but the error is still there. Also, the checkpoints download progress seems done, reaching 100%, so the ckpt files should not be corrupted.

Here is my terminal command:
torchrun --nproc_per_node=1 scripts/train.py configs/official/OLMo-1B.yaml --load_path=https://olmo-checkpoints.org/ai2-llm/olmo-small/46zc5fly/step369000-unsharded --save_folder=/opt/data/private/OLMo/olmo/step369000 --wandb=null

AND THE ERROR:
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Versions

Python 3.8.10
accelerate==0.25.0
ai2-olmo==0.3.1
aiofiles==23.2.1
aiohttp==3.8.6
aiosignal==1.3.1
albumentations==1.3.1
altair==5.1.2
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
anyio==4.0.0
apache-beam==2.55.1
async-timeout==4.0.3
attrs==23.1.0
backports.zoneinfo==0.2.1
beautifulsoup4==4.12.3
bitsandbytes==0.41.3.post2
boto3==1.34.93
botocore==1.34.93
braceexpand==0.1.7
cached-path==1.6.2
cachetools==5.3.3
certifi==2019.11.28
chardet==3.0.4
charset-normalizer==3.3.1
click==8.1.7
clip==0.2.0
clip-benchmark==1.5.0
cloudpickle==2.2.1
cmake==3.27.7
contourpy==1.1.1
crcmod==1.7
cycler==0.12.1
dataclasses==0.6
datasets==2.14.5
dbus-python==1.2.16
dill==0.3.1.1
dnspython==2.6.1
docker-pycreds==0.4.0
docopt==0.6.2
exceptiongroup==1.1.3
ExifRead-nocycle==3.0.1
fastapi==0.104.1
fastavro==1.9.4
fasteners==0.19
ffmpy==0.3.1
filelock==3.12.4
fire==0.4.0
fonttools==4.44.0
frozenlist==1.4.0
fsspec==2023.9.2
ftfy==6.1.1
gdown==5.1.0
gitdb==4.0.11
GitPython==3.1.40
google-api-core==2.18.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
gradio==3.39.0
gradio-client==0.7.0
grpcio==1.62.2
h11==0.14.0
hdfs==2.7.3
httpcore==1.0.2
httplib2==0.22.0
httpx==0.25.1
huggingface-hub==0.22.2
idna==2.8
imageio==2.31.6
img2dataset==1.42.0
importlib-resources==6.1.1
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.3.2
Js2Py==0.74
jsonpickle==3.0.4
jsonschema==4.20.0
jsonschema-specifications==2023.11.1
kiwisolver==1.4.5
lazy-loader==0.3
lightning-utilities==0.11.2
linkify-it-py==2.0.2
lit==17.0.3
loguru==0.7.2
loralib==0.1.2
markdown-it-py==3.0.0
MarkupSafe==2.1.3
matplotlib==3.7.3
mdit-py-plugins==0.3.3
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
networkx==3.1
numpy==1.21.0
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.2.10.91
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.4.91
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu11==2.14.3
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu11==11.7.91
nvidia-nvtx-cu12==12.1.105
objsize==0.7.0
omegaconf==2.3.0
open-clip-torch==2.23.0
openai-clip==1.0.1
opencv-python-headless==4.8.1.78
orjson==3.9.10
packaging==23.2
pandas==1.5.3
pathtools==0.1.2
peft==0.7.1
Pillow==9.1.1
pkgutil-resolve-name==1.3.10
promise==2.3
proto-plus==1.23.0
protobuf==3.20.3
psutil==5.9.6
pyarrow==10.0.1
pyarrow-hotfix==0.6
pyasn1==0.6.0
pyasn1-modules==0.4.0
pycocoevalcap==1.2
pycocotools==2.0.7
pydantic==2.5.1
pydantic-core==2.14.3
pydot==1.4.2
pydub==0.25.1
pygments==2.17.2
PyGObject==3.36.0
pyjsparser==2.7.1
pymongo==4.7.0
pyparsing==3.1.1
PySocks==1.7.1
python-apt==2.0.0+ubuntu0.20.4.7
python-dateutil==2.8.2
python-multipart==0.0.6
pytz==2023.3.post1
PyWavelets==1.4.1
PyYAML==6.0.1
quant-cuda==0.0.0
qudida==0.0.4
referencing==0.31.0
regex==2023.10.3
requests==2.31.0
requests-unixsocket==0.2.0
rich==13.7.1
rpds-py==0.13.0
rsa==4.9
s3transfer==0.10.1
safetensors==0.4.3
scikit-image==0.21.0
scikit-learn==1.3.2
scipy==1.10.1
semantic-version==2.10.0
sentencepiece==0.1.99
sentry-sdk==1.33.1
setproctitle==1.3.3
shortuuid==1.0.11
six==1.14.0
smmap==5.0.1
sniffio==1.3.0
soupsieve==2.5
starlette==0.27.0
sympy==1.12
termcolor==2.3.0
threadpoolctl==3.2.0
tifffile==2023.7.10
timm==0.9.10
tokenizers==0.19.1
toolz==0.12.0
torch==2.0.0
torch-summary==1.4.5
torchaudio==0.9.0
torchmetrics==1.3.2
torchvision==0.15.2+cu117
tqdm==4.66.1
transformers==4.40.1
triton==2.0.0
typing-extensions==4.11.0
tzdata==2023.3
tzlocal==5.2
uc-micro-py==1.0.2
urllib3==1.26.18
uvicorn==0.24.0.post1
wandb==0.12.21
wcwidth==0.2.8
webdataset==0.2.72
websockets==11.0.3
xxhash==3.4.1
yarl==1.9.2
zipp==3.17.0
zstandard==0.22.0

@Jimmy-Yang1217 Jimmy-Yang1217 added the type/bug An issue about a bug label Apr 27, 2024
@dumitrac
Copy link
Contributor

@Jimmy-Yang1217 - could you please include the log before the error occurs?
I'm curious when exactly the error is thrown. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug An issue about a bug
Projects
None yet
Development

No branches or pull requests

2 participants