recreating homo_search.py output -- minimal version #135

avilella · 2023-10-05T12:53:36Z

Hi,

I am running Uni-Fold on antibody-antigen pairs, where the antigen (chain A) is always the same, and the antibody sequences (chain B in each prediction) are very similar to each other (same species).

Since the homo_search.py part of run_unifold.sh multimer takes a long time, but produces very similar hits, I would like to recreate it in a new folder for new predictions, so I can just calculate the second inference.py part of run_unifold.sh on it.

My plan is to aggregate each of the .sto files for a bunch of predictions, and produce a combined version in the new input folder structure to inference.py. The .sto format is a bit cumbersome to recreate, and if the inference.py part is not going to read the alignment structure from it, but rather just the fasta entries, would it be possible to provide the "combined inputs" as multi-fasta files rather than .sto files?

Thanks in advance.

[       4096 Oct  3 15:59]  ./B
[  231414423 Oct  3 15:59]  ./B/uniprot_hits.sto
[   62590981 Oct  3 15:59]  ./B/pdb_hits.sto
[     516587 Oct  3 15:59]  ./B/mgnify_hits.sto
[     462122 Oct  3 15:59]  ./B/bfd_uniclust_hits.a3m
[  184892903 Oct  3 15:59]  ./B/uniref90_hits.sto
[    1788444 Oct  3 15:59]  ./B.uniprot.pkl.gz
[         81 Oct  3 15:59]  ./B.timings.json
[          3 Oct  3 15:59]  ./chains.txt
[        833 Oct  3 15:59]  ./chain_id_map.json
[     811365 Oct  3 15:59]  ./B.feature.pkl.gz
[      31503 Oct  3 15:59]  ./A.uniprot.pkl.gz
[         80 Oct  3 15:59]  ./A.timings.json
[     282367 Oct  3 15:59]  ./A.feature.pkl.gz
[       4096 Oct  3 15:59]  ./A
[    1139693 Oct  3 15:59]  ./A/uniref90_hits.sto
[     966652 Oct  3 15:59]  ./A/uniprot_hits.sto
[   40494816 Oct  3 15:59]  ./A/pdb_hits.sto
[       3189 Oct  3 15:59]  ./A/mgnify_hits.sto
[     233861 Oct  3 15:59]  ./A/bfd_uniclust_hits.a3m
[        255 Oct  3 15:59]  ./1b634d49dfcce4784af7c9bbb7d53496.TRI002.mmer_B.fasta
[        123 Oct  3 15:59]  ./1b634d49dfcce4784af7c9bbb7d53496.TRI002.mmer_A.fasta

The text was updated successfully, but these errors were encountered:

ZiyaoLi · 2023-10-08T02:59:31Z

I would recommend you to refer to the mmseqs processing code here and here. It has a lighter processing pipeline.

avilella · 2023-10-09T14:29:17Z

If I am reading the code in inference.py correctly, for multimer, it reads the uniprot_msa_dir?

def load_feature_for_one_target(
    config, data_folder, seed=0, is_multimer=False, use_uniprot=False
):
    if not is_multimer:
        uniprot_msa_dir = None
        sequence_ids = ["A"]
        if use_uniprot:
            uniprot_msa_dir = data_folder

    else:
        uniprot_msa_dir = data_folder
        sequence_ids = open(os.path.join(data_folder, "chains.txt")).readline().split()
    batch, _ = load_and_process(
        config=config.data,
        mode="predict",
        seed=seed,
        batch_idx=None,
        data_idx=0,
        is_distillation=False,
        sequence_ids=sequence_ids,
        monomer_feature_dir=data_folder,
        uniprot_msa_dir=uniprot_msa_dir,
        is_monomer=(not is_multimer),
    )
    batch = UnifoldDataset.collater([batch])
    return batch


def main(args):
    config = model_config(args.model_name)
    config.data.common.max_recycling_iters = args.max_recycling_iters
-UU-:----F1  inference.py   13% (70,0)    Git-main  (Python ElDoc) ---------------------------------------------------------------------------------------------------------------------------------------------------

ZiyaoLi · 2023-10-12T10:19:43Z

Yes. Uniprot msas are used for msa-pairing because they contain species information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recreating homo_search.py output -- minimal version #135

recreating homo_search.py output -- minimal version #135

avilella commented Oct 5, 2023 •

edited

ZiyaoLi commented Oct 8, 2023

avilella commented Oct 9, 2023

ZiyaoLi commented Oct 12, 2023

recreating homo_search.py output -- minimal version #135

recreating homo_search.py output -- minimal version #135

Comments

avilella commented Oct 5, 2023 • edited

ZiyaoLi commented Oct 8, 2023

avilella commented Oct 9, 2023

ZiyaoLi commented Oct 12, 2023

avilella commented Oct 5, 2023 •

edited