Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda error #186

Open
hky666 opened this issue Jan 27, 2024 · 0 comments
Open

cuda error #186

hky666 opened this issue Jan 27, 2024 · 0 comments

Comments

@hky666
Copy link

hky666 commented Jan 27, 2024

当我运行python inference.py mypdb.fasta data/pdb_mmcif/mmcif_files/
--use_precomputed_alignments ./alignments
--output_dir ./
--gpus 4
--model_preset multimer
--uniref90_database_path data/uniref90/uniref90.fasta
--mgnify_database_path data/mgnify/mgy_clusters_2022_05.fa
--pdb70_database_path data/pdb70/pdb70
--uniref30_database_path data/uniref30/UniRef30_2021_03
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
--uniprot_database_path data/uniprot/uniprot.fasta
--pdb_seqres_database_path data/pdb_seqres/pdb_seqres.txt
--param_path data/params/params_model_1_multimer_v3.npz
--model_name model_1_multimer_v3
--jackhmmer_binary_path which jackhmmer
--hhblits_binary_path which hhblits
--hhsearch_binary_path which hhsearch
--kalign_binary_path which kalign
--enable_workflow
--inplace
报错running in multimer mode...
[01/26/24 20:11:10] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 0 is bound to device 0
[01/26/24 20:11:10] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:521 set_device
[01/26/24 20:11:10] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 3 is bound to device 3
[01/26/24 20:11:10] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 2 is bound to device 2
INFO colossalai - colossalai - INFO: process rank 1 is bound to device 1
[01/26/24 20:11:12] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:557 set_seed
[01/26/24 20:11:12] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:557 set_seed
[01/26/24 20:11:12] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 2, numpy:
1024, python random: 1024, ParallelMode.DATA: 1024,
ParallelMode.TENSOR: 1026,the default parallel seed is
ParallelMode.DATA.
INFO colossalai - colossalai - INFO: initialized seed on rank 3, numpy:
1024, python random: 1024, ParallelMode.DATA: 1024,
ParallelMode.TENSOR: 1027,the default parallel seed is
ParallelMode.DATA.
INFO colossalai - colossalai - INFO: initialized seed on rank 1, numpy:
1024, python random: 1024, ParallelMode.DATA: 1024,
ParallelMode.TENSOR: 1025,the default parallel seed is
ParallelMode.DATA.
[01/26/24 20:11:12] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 0, numpy:
1024, python random: 1024, ParallelMode.DATA: 1024,
ParallelMode.TENSOR: 1024,the default parallel seed is
ParallelMode.DATA.
INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/initialize.py:116 launch
INFO colossalai - colossalai - INFO: Distributed environment is
initialized, data parallel size: 1, pipeline parallel size: 1, tensor
parallel size: 4
Traceback (most recent call last):
File "inference.py", line 556, in
main(args)
File "inference.py", line 164, in main
inference_multimer_model(args)
File "inference.py", line 293, in inference_multimer_model
torch.multiprocessing.spawn(inference_model, nprocs=args.gpus, args=(args.gpus, result_q, batch, args))
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/khuang/video/FastFold-main/inference.py", line 151, in inference_model
out = model(batch)
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/khuang/video/FastFold-main/fastfold/model/hub/alphafold.py", line 522, in forward
outputs, m_1_prev, z_prev, x_prev = self.iteration(
File "/home/khuang/video/FastFold-main/fastfold/model/hub/alphafold.py", line 209, in iteration
else self.input_embedder(feats)
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/khuang/video/FastFold-main/fastfold/model/nn/embedders_multimer.py", line 141, in forward
tf_emb_i = self.linear_tf_z_i(tf)
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

有遇到过的嘛

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant