Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question/problem] "Resource temporarily unavailable" when pushing data to the output store #8

Open
alexenge opened this issue Aug 15, 2022 · 3 comments

Comments

@alexenge
Copy link

First of all, thanks a lot for developing DataLad and this amazing workflow, and congrats on the beautiful paper!

I'm trying to use the workflow on our HPC and it mostly works fine. However, when trying to push the job-specific outputs from the datalad containers-run command back to the output store, I frequently encounter the following error message:

+ datalad push --dataset derivatives --to output-storage
[INFO] Determine push target 
[INFO] Push refspecs 
[INFO] Transfer data 
CommandError: 'git -c diff.ignoreSubmodules=none annex copy --batch -z --to output-storage --fast --json --json-error-messages --json-progress -c annex.dotfiles=true' failed with exitcode 1 under /cobra/ptmp/aenge/tmp/ds_job_5823363/derivatives [info keys: stdout_json]
> to output-storage...
  [Errno 11] Resource temporarily unavailable: '.git/annex/objects/gV/qM/MD5E-s2208870400--827c9a19740e1b612fd3c639b3a849ef/MD5E-s2208870400--827c9a19740e1b612fd3c639b3a849ef' -> '/cobra/ptmp/aenge/slang/data/.outputstore/f71/91657-585a-441a-ac8d-fc95fa4b04b6/ora-remote-e6a93028-fb7f-4f15-b1f1-a13be58fd000/transfer/MD5E-s2208870400--827c9a19740e1b612fd3c639b3a849ef'
  This could have failed because --fast is enabled.
copy: 1 failed

As a bit of context, in my batch jobs (using SLURM) I'm cloning a BIDS dataset and then preprocess the BIDS data from a single participant using afni_proc.py. I'm defining the resulting pre-processed anatomical and time-series data as well as some first-level statistical maps as --outputs in datalad containers-run, and so these are the files that should get pushed to the output store so that I can later merge them back into my main BIDS dataset.

I haven't yet been able to determine exactly when and for what kind of files this error occurs – right now it seems to be pretty random. Of course, this might be very particular to the setup of our HPC. But I still wanted to ask if you have any experience or ideas for how to deal with this kind of error.

Thanks a lot in advance!

@adswa
Copy link
Collaborator

adswa commented Aug 16, 2022

Hi! Could you share some information about the system (in particular file system) you are running this on? datalad wtf can provide a report

@alexenge
Copy link
Author

Thanks a lot for the quick reply! Here's our datalad wtf:

# WTF
## configuration <SENSITIVE, report disabled by configuration>
## credentials 
  - keyring: 
    - active_backends: 
      - PlaintextKeyring with no encyption v.1.0 at /u/aenge/.local/share/python_keyring/keyring_pass.cfg
    - config_file: /u/aenge/.config/python_keyring/keyringrc.cfg
    - data_root: /u/aenge/.local/share/python_keyring
## datalad 
  - version: 0.15.3
## dataset 
  - branches: 
    - custom-templates@873067a
    - git-annex@584d3e9
    - master@0e8a34f
  - id: f7191657-585a-441a-ac8d-fc95fa4b04b6
  - metadata: <SENSITIVE, report disabled by configuration>
  - path: /ptmp/aenge/slang/data/derivatives
  - repo: AnnexRepo
## dependencies 
  - annexremote: 1.5.0
  - appdirs: 1.4.4
  - boto: 2.49.0
  - cmd:7z: 16.02
  - cmd:annex: 8.20211118-g23ee48898
  - cmd:bundled-git: 2.34.0
  - cmd:git: 2.34.0
  - cmd:system-git: 2.32.0
  - cmd:system-ssh: 7.2p2
  - exifread: 2.3.2
  - humanize: 3.13.1
  - iso8601: 1.0.2
  - keyring: 23.4.0
  - keyrings.alt: 4.0.2
  - msgpack: 1.0.2
  - mutagen: 1.45.1
  - requests: 2.26.0
  - wrapt: 1.12.1
## environment 
  - GIT_ASKPASS: /cobra/u/aenge/.vscode-server/bin/6d9b74a70ca9c7733b29f0456fd8195364076dda/extensions/git/dist/askpass.sh
  - LANG: /usr/lib/locale/en_US
  - LC_ALL: en_US.UTF8
  - PATH: /u/aenge/.local/bin://u/aenge/bin:/cobra/u/aenge/.vscode-server/bin/6d9b74a70ca9c7733b29f0456fd8195364076dda/bin/remote-cli:/opt/containers/singularity/bin:/u/aenge/.local/bin://u/aenge/bin:/u/aenge/conda-envs/slang/bin:/mpcdf/soft/SLE_12/packages/x86_64/anaconda/3/2020.02/condabin:/mpcdf/soft/SLE_12/packages/x86_64/Modules/5.0.1/bin:/u/aenge/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/afs/ipp/amd64_sles12/bin:/mpcdf/soft/SLE_12/packages/x86_64/find-module/1.0/bin
  - PYTHONSTARTUP: /etc/pythonstart
## extensions 
  - container: 
    - description: Containerized environments
    - entrypoints: 
      - datalad_container.containers_add.ContainersAdd: 
        - class: ContainersAdd
        - load_error: None
        - module: datalad_container.containers_add
        - names: 
          - containers-add
          - containers_add
      - datalad_container.containers_list.ContainersList: 
        - class: ContainersList
        - load_error: None
        - module: datalad_container.containers_list
        - names: 
          - containers-list
          - containers_list
      - datalad_container.containers_remove.ContainersRemove: 
        - class: ContainersRemove
        - load_error: None
        - module: datalad_container.containers_remove
        - names: 
          - containers-remove
          - containers_remove
      - datalad_container.containers_run.ContainersRun: 
        - class: ContainersRun
        - load_error: None
        - module: datalad_container.containers_run
        - names: 
          - containers-run
          - containers_run
    - load_error: None
    - module: datalad_container
    - version: 1.1.5
## git-annex 
  - build flags: 
    - Assistant
    - Webapp
    - Pairing
    - Inotify
    - DBus
    - DesktopNotify
    - TorrentParser
    - MagicMime
    - Feeds
    - Testsuite
    - S3
    - WebDAV
  - dependency versions: 
    - aws-0.22
    - bloomfilter-2.0.1.0
    - cryptonite-0.26
    - DAV-1.3.4
    - feed-1.3.0.1
    - ghc-8.8.4
    - http-client-0.6.4.1
    - persistent-sqlite-2.10.6.2
    - torrent-10000.1.1
    - uuid-1.3.13
    - yesod-1.6.1.0
  - key/value backends: 
    - SHA256E
    - SHA256
    - SHA512E
    - SHA512
    - SHA224E
    - SHA224
    - SHA384E
    - SHA384
    - SHA3_256E
    - SHA3_256
    - SHA3_512E
    - SHA3_512
    - SHA3_224E
    - SHA3_224
    - SHA3_384E
    - SHA3_384
    - SKEIN256E
    - SKEIN256
    - SKEIN512E
    - SKEIN512
    - BLAKE2B256E
    - BLAKE2B256
    - BLAKE2B512E
    - BLAKE2B512
    - BLAKE2B160E
    - BLAKE2B160
    - BLAKE2B224E
    - BLAKE2B224
    - BLAKE2B384E
    - BLAKE2B384
    - BLAKE2BP512E
    - BLAKE2BP512
    - BLAKE2S256E
    - BLAKE2S256
    - BLAKE2S160E
    - BLAKE2S160
    - BLAKE2S224E
    - BLAKE2S224
    - BLAKE2SP256E
    - BLAKE2SP256
    - BLAKE2SP224E
    - BLAKE2SP224
    - SHA1E
    - SHA1
    - MD5E
    - MD5
    - WORM
    - URL
    - X*
  - local repository version: 8
  - operating system: linux x86_64
  - remote types: 
    - git
    - gcrypt
    - p2p
    - S3
    - bup
    - directory
    - rsync
    - web
    - bittorrent
    - webdav
    - adb
    - tahoe
    - glacier
    - ddar
    - git-lfs
    - httpalso
    - borg
    - hook
    - external
  - supported repository versions: 
    - 8
  - upgrade supported from repository versions: 
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
    - 7
  - version: 8.20211118-g23ee48898
## location 
  - path: /ptmp/aenge/slang/data/derivatives
  - type: dataset
## metadata_extractors 
  - annex (datalad 0.15.3): 
    - distribution: datalad 0.15.3
    - load_error: None
    - module: datalad.metadata.extractors.annex
    - version: None
  - audio (datalad 0.15.3): 
    - distribution: datalad 0.15.3
    - load_error: None
    - module: datalad.metadata.extractors.audio
    - version: None
  - datacite (datalad 0.15.3): 
    - distribution: datalad 0.15.3
    - load_error: None
    - module: datalad.metadata.extractors.datacite
    - version: None
  - datalad_core (datalad 0.15.3): 
    - distribution: datalad 0.15.3
    - load_error: None
    - module: datalad.metadata.extractors.datalad_core
    - version: None
  - datalad_rfc822 (datalad 0.15.3): 
    - distribution: datalad 0.15.3
    - load_error: None
    - module: datalad.metadata.extractors.datalad_rfc822
    - version: None
  - exif (datalad 0.15.3): 
    - distribution: datalad 0.15.3
    - load_error: None
    - module: datalad.metadata.extractors.exif
    - version: None
  - frictionless_datapackage (datalad 0.15.3): 
    - distribution: datalad 0.15.3
    - load_error: None
    - module: datalad.metadata.extractors.frictionless_datapackage
    - version: None
  - image (datalad 0.15.3): 
    - distribution: datalad 0.15.3
    - load_error: None
    - module: datalad.metadata.extractors.image
    - version: None
  - xmp (datalad 0.15.3): 
    - distribution: datalad 0.15.3
    - load_error: ModuleNotFoundError(No module named 'libxmp')
    - module: datalad.metadata.extractors.xmp
## metadata_indexers 
## python 
  - implementation: CPython
  - version: 3.8.12
## system 
  - distribution: sles/12.5/n/a
  - encoding: 
    - default: utf-8
    - filesystem: utf-8
    - locale.prefered: UTF-8
  - max_path_length: 290
  - name: Linux
  - release: 4.12.14-122.124-default
  - type: posix
  - version: #1 SMP Tue Jun 7 11:05:50 UTC 2022 (c120486)

Please do let me know if you need further information 🙂

@alexenge
Copy link
Author

Any idea why this error may occur?

After playing around a bit, I've noticed that:

  • It only happens files larger than approx. 100 MB (e.g., pre-processed BOLD data in AFNI's .BRIK format)
  • It only happens during batch jobs (using SLURM's sbatch), whereas running the same datalad push command in an interactive terminal works perfectly fine and pushes all of the data within a couple of seconds. Note that one difference is that the sbatch jobs don't have an internet connection, but since the outputstore is local, this shouldn't be the deal?
  • The error comes up pretty much immediately after the datalad push command starts, so it's not like it's trying for a long time (or retrying multiple times). So there doesn't seem to be a timeout involved (or if so, it's super short).

Things I've tried to resolve the error (unsuccessfully thus far):

  • Made sure all output files are added to git-annex instead of git, even if they are non-binary (since AFNI produces some pretty large text files that would make git really slow)
  • Used --force checkdatapresent or --force all with datalad push. This gets rid of the line This could have failed because --fast is enabled. in the error message, but otherwise, the error itself remains the same.

Any help or ideas would be very much appreciated 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants