Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soloseq inference - can't fold using ESM-1b alone #409

Open
amardeepranu opened this issue Feb 21, 2024 · 6 comments
Open

Soloseq inference - can't fold using ESM-1b alone #409

amardeepranu opened this issue Feb 21, 2024 · 6 comments

Comments

@amardeepranu
Copy link

amardeepranu commented Feb 21, 2024

In the README it states that template finding for SoloSeq will be skipped if no tools or dbs are passed, and the fold will happen using ESM alone. However I get multiple errors when running the following command:

python openfold/run_pretrained_openfold.py \
    meth_fastas \
    openfold/data/pdb_mmcif/mmcif_files \
    --output_dir results \
    --model_device "cuda:0" \
    --config_preset "seq_model_esm1b_ptm" \
    --openfold_checkpoint_path openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt
  1. If the mmcif_files are not downloaded it fails if they aren't downloaded, I had to download them to get past this.
  2. Removing all args related to tools and db like above, I still get a HHSearch error:
INFO:/root/openfold/openfold/openfold/utils/script_utils.py:Loaded OpenFold parameters at openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt...
INFO:/root/openfold/openfold/run_pretrained_openfold.py:Generating alignments for MGYG000000044_01310...
Traceback (most recent call last):
  File "/root/openfold/openfold/run_pretrained_openfold.py", line 470, in <module>
    main(args)
  File "/root/openfold/openfold/run_pretrained_openfold.py", line 275, in main
    precompute_alignments(tags, seqs, alignment_dir, args)
  File "/root/openfold/openfold/run_pretrained_openfold.py", line 80, in precompute_alignments
    template_searcher = hhsearch.HHSearch(
  File "/root/openfold/openfold/openfold/data/tools/hhsearch.py", line 58, in __init__
    if not glob.glob(database_path + "_*"):
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

It still seems to be attempting to generate alignments. Is this an error? Do I need to specify another flag to skip every tool + alignment and just use ESM based folding?

Thank you.

@jnwei
Copy link
Collaborator

jnwei commented Feb 22, 2024

Hi @amardeepranu , thanks for your interest in Soloseq.

Could you try generating the embeddings first using:
python scripts/precompute_embeddings.py fasta_dir/ embeddings_output_dir/

And then using the same run_pretrained_openfold.py command, but with --use_precomputed_alignments=embeddings_output_dir

@amardeepranu
Copy link
Author

@jnwei thanks, that worked but now I'm getting:

FileNotFoundError: [Errno 2] No such file or directory: 'openfold/resources/params/params_model_1.npz'
bash: line 6: --output_dir: command not found

Seems like it requires --jax_param_path? Is this required to run the folding?

@jnwei
Copy link
Collaborator

jnwei commented Feb 23, 2024

There are a separate set of weights used for soloseq, which were defined in your earlier command by this argument: --openfold_checkpoint_path openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt

Judging by the bash: line 6: --output_dir: command not found message, perhaps there's a whitespace/ newline character issue in the command?

@amardeepranu
Copy link
Author

@jnwei this is my full command:

python openfold/run_pretrained_openfold.py \
    fastas \
    --use_precomputed_alignments embeddings/meth \
    --output_dir results \
    --model_device "cuda:0" \
    --config_preset "seq_model_esm1b_ptm" \
    --openfold_checkpoint_path openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt

with this I get an error demanding template_mmcif_dir to be included. Are templates required when running ESM-only folding?

@jnwei
Copy link
Collaborator

jnwei commented Feb 27, 2024

You will need to provide a directory for the --template_mmcif_dir. Despite the required flag, templates are not necessary for folding predictions.

If your precomputed alignments directory does not contain any alignments for templates (e.g. it only contains the pre-computed ESM embeddings), then template structures will not be used for creating the prediction.

In the future, we may refactor the inference script so that the soloseq mode does not require a template_mmcif_dir if the template-based prediction path is not used to help avoid this confusion.

@vaclavhanzl
Copy link
Contributor

Note that the directory cannot be empty so if you want to use no templates at all, at the moment you still need to at least fake one with something like touch empty.cif.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants