Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HHblits Prefiltering database run time on Supercomputer Cluster #365

Open
Harryyang21 opened this issue Jan 7, 2024 · 3 comments
Open

Comments

@Harryyang21
Copy link

AlphaFold Stuck at hhblits Step on Cluster Compute Node

Issue Description:
I am experiencing a problem running AlphaFold on a compute node in our cluster. The process consistently gets stuck at the hhblits step. However, when I run the same program directly on the login node, it proceeds without any issues. The issue arises specifically when submitting the job to the compute node - it hangs at "- 22:54:01.709 INFO: Prefiltering database" and does not progress further.

Additionally, system call tracing shows repeated occurrences of the following:

futex(0x2b5ff13dd634, FUTEX_WAIT_PRIVATE, 4294967295, NULL) = 0
futex(0x2b5ff13dd634, FUTEX_WAKE_PRIVATE, 1) = 0
...
During this time, there is no additional memory load, and the GPU does not appear to be computing, although the program itself seems to have a load.

login node can output normally like this:

  • 22:15:22.605 INFO: Searching 32053680 column state sequences.

  • 22:15:22.729 INFO: /tmp/yanghao2022/MSA_4508283962/seq.fasta is in A2M, A3M or FASTA format

  • 22:15:22.730 INFO: Iteration 1

  • 22:15:22.808 INFO: Prefiltering database

  • 22:16:19.797 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 693794

  • 22:16:25.614 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 292

  • 22:16:25.614 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 292

  • 22:16:25.614 INFO: Scoring 292 HMMs using HMM-HMM Viterbi alignment

  • 22:16:26.110 INFO: Alternative alignment: 0

  • 22:16:31.556 INFO: 292 alignments done

  • 22:16:31.559 INFO: Alternative alignment: 1

  • 22:16:31.623 INFO: 287 alignments done

  • 22:16:31.624 INFO: Alternative alignment: 2

  • 22:16:31.648 INFO: 20 alignments done

  • 22:16:31.648 INFO: Alternative alignment: 3

  • 22:16:31.679 INFO: 3 alignments done

  • 22:16:31.984 INFO: Realigning 210 HMM-HMM alignments using Maximum Accuracy algorithm

  • 22:16:33.013 INFO: 77 sequences belonging to 77 database HMMs found with an E-value < 0.001

  • 22:16:33.013 INFO: Number of effective sequences of resulting query HMM: Neff = 5.92897

  • 22:16:33.040 INFO: Iteration 2

Environment Description:
_libgcc_mutex 0.1 main defaults
_openmp_mutex 5.1 1_gnu defaults
absl-py 0.13.0 pypi_0 pypi
astunparse 1.6.3 pypi_0 pypi
biopython 1.79 pypi_0 pypi
ca-certificates 2023.12.12 h06a4308_0 defaults
cachetools 5.3.2 pypi_0 pypi
certifi 2023.11.17 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
chex 0.0.7 pypi_0 pypi
click 8.1.7 pypi_0 pypi
contextlib2 21.6.0 pypi_0 pypi
cudatoolkit 11.3.1 h9edb442_10 conda-forge
cudatoolkit-dev 11.3.1 py38h497a2fe_0 conda-forge
cudnn 8.2.1.32 h86fa8c9_0 conda-forge
dm-haiku 0.0.4 pypi_0 pypi
dm-tree 0.1.6 pypi_0 pypi
fftw 3.3.10 nompi_h77c792f_102 conda-forge
flatbuffers 1.12 pypi_0 pypi
gast 0.4.0 pypi_0 pypi
google-auth 2.26.1 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.34.1 pypi_0 pypi
h5py 3.1.0 pypi_0 pypi
hhsuite 3.3.0 py38pl5321h8ded8fe_5 bioconda
hmmer 3.3.2 h87f3376_2 bioconda
idna 3.6 pypi_0 pypi
immutabledict 2.0.0 pypi_0 pypi
importlib-metadata 7.0.1 pypi_0 pypi
jax 0.2.14 pypi_0 pypi
jaxlib 0.1.69+cuda111 pypi_0 pypi
kalign2 2.04 hec16e2b_3 bioconda
keras-nightly 2.5.0.dev2021032900 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
libblas 3.9.0 15_linux64_openblas conda-forge
libcblas 3.9.0 15_linux64_openblas conda-forge
libedit 3.1.20230828 h5eee18b_0 defaults
libffi 3.2.1 hf484d3e_1007 defaults
libgcc-ng 11.2.0 h1234567_1 defaults
libgfortran-ng 13.2.0 h69a702a_0 conda-forge
libgfortran5 13.2.0 ha4646dd_0 conda-forge
libgomp 11.2.0 h1234567_1 defaults
liblapack 3.9.0 15_linux64_openblas conda-forge
libnsl 2.0.0 h5eee18b_0 defaults
libopenblas 0.3.20 pthreads_h78a6416_0 conda-forge
libstdcxx-ng 11.2.0 h1234567_1 defaults
markdown 3.5.1 pypi_0 pypi
markupsafe 2.1.3 pypi_0 pypi
ml-collections 0.1.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0 defaults
numpy 1.19.5 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
ocl-icd 2.3.1 h7f98852_0 conda-forge
ocl-icd-system 1.0.0 1 conda-forge
openmm 7.5.1 py38ha082873_1 conda-forge
openssl 1.1.1w h7f8727e_0 defaults
opt-einsum 3.3.0 pypi_0 pypi
pandas 1.3.4 pypi_0 pypi
pdbfixer 1.7 pyhd3deb0d_0 conda-forge
perl 5.32.1 0_h5eee18b_perl5 defaults
pillow 10.2.0 pypi_0 pypi
pip 23.3.2 pypi_0 pypi
protobuf 3.20.3 pypi_0 pypi
pyasn1 0.5.1 pypi_0 pypi
pyasn1-modules 0.3.0 pypi_0 pypi
python 3.8.0 h0371630_2 defaults
python-dateutil 2.8.2 pypi_0 pypi
python_abi 3.8 2_cp38 conda-forge
pytz 2023.3.post1 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
readline 7.0 h7b6447c_5 defaults
requests 2.31.0 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
rsa 4.9 pypi_0 pypi
scipy 1.7.0 pypi_0 pypi
setuptools 68.2.2 py38h06a4308_0 defaults
six 1.15.0 pypi_0 pypi
sqlite 3.33.0 h62c20be_0 defaults
svgwrite 1.4.3 pypi_0 pypi
tabulate 0.9.0 pypi_0 pypi
tensorboard 2.11.2 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
tensorflow 2.5.0 pypi_0 pypi
tensorflow-cpu 2.5.0 pypi_0 pypi
tensorflow-estimator 2.5.0 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
tk 8.6.12 h1ccaba5_0 defaults
toolz 0.12.0 pypi_0 pypi
tree 0.2.4 pypi_0 pypi
typing-extensions 3.7.4.3 pypi_0 pypi
urllib3 2.1.0 pypi_0 pypi
werkzeug 3.0.1 pypi_0 pypi
wheel 0.41.2 py38h06a4308_0 defaults
wrapt 1.12.1 pypi_0 pypi
xz 5.4.5 h5eee18b_0 defaults
zipp 3.17.0 pypi_0 pypi
zlib 1.2.13 h5eee18b_0 defaults

AlphaFold Version: 2.3.2
Operating System and Version: CentOS Linux release 7.6.1810 (Core)
Thank you very much in advance

@miguelcorrea
Copy link

Running into the same issue myself on an HPC running Slurm.

Issue #368 seems to be a duplicate of this, this is definitely not a one-off.

@milot-mirdita
Copy link
Member

hhblits_omp has issues with many cores for some reason, that we were not able to pinpoint.

I recommend to use a script similar to the following as a workaround:
https://github.com/soedinglab/hhdatabase_cif70/blob/master/pdb70_hhblits_lock.sh

@charliedhw
Copy link

Running hhblits manually with command in the picture, it works fine sometims, but sometims end with erros. Jobs submited by slurm.
strace running command, following error occurs. No idea how to control that. Any suggestion?

'Thread creation failed: Invalid "..., 40Thread creation failed: Invalid argument'

hhblits-debug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants