Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using pytorch dataloader with h5py when num_workers > 0 #542

Open
Boltzmachine opened this issue Mar 27, 2023 · 1 comment
Open
Labels
bug Something isn't working

Comments

@Boltzmachine
Copy link

🐛 Describe the bug

Here is the minimal example to reproduce
└── test
├── init.py
├── main.py
└── test.jsonnet

In main.py

import torch
from tango import Step
import h5py

class Dataset(torch.utils.data.Dataset):
    def __init__(self) -> None:
        super().__init__()
        self.file = h5py.File("data.h5", "w")

    def __getitem__(self, idx):
        return None

@Step.register("main")
class Main(Step):
    def run(self):
        dataset = Dataset()
        loader = torch.utils.data.DataLoader(dataset, num_workers=2)
        iter(loader)

In test.jsonnet,

{
    steps: {
        type: "main"
    }
}

And __init__.py is empty

when I run tango run test/test.jsonnet -i test
I got the error

[03/26/23 20:46:38] ERROR    Uncaught exception
Traceback (most recent call last):
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/site-packages/tango/step.py", line 468, in _run_with_work_dir
    result = self.run(**kwargs)
  File "/gpfs/gibbs/project/ying_rex/wq44/Yale-Brain-Graph/test/main.py", line 19, in run
    iter(loader)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 368, in __iter__
    return self._get_iterator()
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 314, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 927, in __init__
    w.start()
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
    raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled

Versions

Python 3.8.13
ai2-tango==1.2.0
aiohttp==3.8.4
aiosignal==1.3.1
antlr4-python3-runtime==4.9.3
anyio==3.6.2
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
astor==0.8.1
asttokens==2.0.8
async-timeout==4.0.2
attrs==22.1.0
Babel==2.11.0
backcall==0.2.0
base58==2.1.1
beautifulsoup4==4.11.1
bids-validator==1.9.9
bleach==5.0.1
boto3==1.26.95
botocore==1.29.95
braceexpand==0.1.7
brotlipy @ file:///home/conda/feedstock_root/build_artifacts/brotlipy_1666764652625/work
cached-path==1.3.3
cachetools==5.3.0
certifi==2022.9.24
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1666754696558/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1661170624537/work
click==8.1.3
click-help-colors==0.9.1
contourpy==1.0.6
cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography_1667422951827/work
cycler==0.11.0
datasets==2.10.1
debugpy==1.6.3
decorator==5.1.1
defusedxml==0.7.1
-e git+https://github.com/cvignac/DiGress.git@57df5f8c061d1839c8d101f9378484301061a49d#egg=DiGress
dill==0.3.6
docker-pycreds==0.4.0
docopt==0.6.2
einops==0.6.0
entrypoints==0.4
executing==1.1.0
fastjsonschema==2.16.2
fcd-torch==1.0.7
filelock==3.8.0
fonttools==4.38.0
formulaic==0.3.4
frozenlist==1.3.3
fsspec==2023.1.0
gitdb==4.0.9
GitPython==3.1.29
glob2==0.7
google-api-core==2.11.0
google-auth==2.16.2
google-cloud-core==2.3.2
google-cloud-storage==2.7.0
google-crc32c==1.5.0
google-resumable-media==2.4.1
googleapis-common-protos==1.58.0
googledrivedownloader==0.4
graphviz==0.20.1
h5py==3.7.0
huggingface-hub==0.10.1
Hydra==2.5
hydra-core==1.2.0
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work
importlib-metadata==5.0.0
importlib-resources==5.10.0
interface-meta==1.3.0
ipdb==0.13.9
ipykernel==6.17.1
ipython==8.5.0
ipython-genutils==0.2.0
ipywidgets==8.0.2
isodate==0.6.1
jedi==0.18.1
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
json5==0.9.10
jsonschema==4.17.0
jupyter==1.0.0
jupyter-console==6.4.4
jupyter-server==1.23.1
jupyter_client==7.4.5
jupyter_core==5.0.0
jupyterlab==3.5.0
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.3
jupyterlab_server==2.16.3
kiwisolver==1.4.4
lightgbm==3.3.3
lightning-utilities==0.6.0.post0
load-confounds==0.12.0
lxml==4.9.2
markdown-it-py==2.2.0
MarkupSafe==2.1.1
matplotlib==3.6.3
matplotlib-inline==0.1.6
mdurl==0.1.2
mini-moses @ git+https://github.com/igor-krawczuk/mini-moses@1eda77fcc13045a93024835d403ff61634ac4b69
mistune==2.0.4
mkl-fft==1.3.1
mkl-random==1.2.2
mkl-service==2.4.0
more-itertools==9.1.0
multidict==6.0.4
multiprocess==0.70.14
nbclassic==0.4.8
nbclient==0.7.0
nbconvert==7.2.4
nbformat==5.7.0
nest-asyncio==1.5.6
networkx==2.8.8
nibabel==4.0.2
nilearn==0.9.0
notebook==6.5.2
notebook_shim==0.2.2
num2words==0.5.12
numpy @ file:///croot/numpy_and_numpy_base_1667233465264/work
omegaconf==2.2.3
overrides==7.3.1
packaging==21.3
pandas==1.5.1
pandocfilters==1.5.0
parso==0.8.3
pathtools==0.1.2
petname==2.6
pexpect==4.8.0
pickleshare==0.7.5
Pillow @ file:///home/conda/feedstock_root/build_artifacts/pillow_1666920566244/work
pkgutil_resolve_name==1.3.10
platformdirs==2.5.4
plotly==5.11.0
prometheus-client==0.15.0
promise==2.3
prompt-toolkit==3.0.31
protobuf==4.21.9
psutil==5.9.4
ptyprocess==0.7.0
pudb==2022.1.3
pure-eval==0.2.2
py==1.11.0
pyarrow==11.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybids==0.15.5
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work
Pygments==2.13.0
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1665350324128/work
pyparsing==3.0.9
pyrsistent==0.19.2
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work
python-dateutil==2.8.2
pytorch-lightning==1.9.2
pytz==2022.6
PyYAML==6.0
pyzmq==24.0.1
qtconsole==5.4.0
QtPy==2.3.0
rdflib==6.2.0
rdkit==2022.9.4
regex==2022.10.31
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1661872987712/work
responses==0.18.0
retry==0.9.2
rich==13.3.2
rjsonnet==0.5.2
rsa==4.9
s3transfer==0.6.0
sacremoses==0.0.53
scikit-learn==1.1.3
scipy==1.9.3
seaborn==0.12.1
Send2Trash==1.8.0
sentry-sdk==1.10.1
setproctitle==1.3.2
shortuuid==1.0.10
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
smmap==5.0.0
sniffio==1.3.0
soupsieve==2.3.2.post1
SQLAlchemy==1.3.24
sqlitedict==2.1.0
stack-data==0.5.1
templateflow==0.8.1
tenacity==8.1.0
terminado==0.17.0
threadpoolctl==3.1.0
tinycss2==1.2.1
tokenizers==0.13.2
toml==0.10.2
tomli==2.0.1
torch==1.11.0
torch-geometric==2.1.0.post1
torch-scatter==2.0.9
torch-sparse==0.6.15
torchaudio==0.11.0
torchmetrics==0.11.0
torchvision==0.12.0
torchviz==0.0.2
tornado==6.2
tqdm==4.64.1
traitlets==5.4.0
transformers==4.16.2
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1665144421445/work
urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1658789158161/work
urwid==2.1.2
urwid-readline==0.13
wandb==0.13.5
wcwidth==0.2.5
webdataset==0.1.103
webencodings==0.5.1
websocket-client==1.4.2
widgetsnbextension==4.0.3
wrapt==1.14.1
xxhash==3.2.0
yacs==0.1.8
yarl==1.8.2
zipp==3.10.0
@Boltzmachine Boltzmachine added the bug Something isn't working label Mar 27, 2023
@Boltzmachine
Copy link
Author

If I run the code without tango, i.e. use python command directly, I won't get any error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant