Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homodimer prediction missing chain #163

Open
s-kyungyong opened this issue Mar 17, 2023 · 5 comments
Open

Homodimer prediction missing chain #163

s-kyungyong opened this issue Mar 17, 2023 · 5 comments

Comments

@s-kyungyong
Copy link

s-kyungyong commented Mar 17, 2023

Hi

While checking the multimer outputs, I realized that for homodimers, there is only a single chain, B.

python /global/scratch/users/skyungyong/Software/FastFold/inference.py --output_dir ./ --model_preset multimer --use_precomputed_alignments Alignments --enable_workflow --inplace --param_path /global/scratch/users/skyungyong/Software/FastFold/data/params/params_model_1_multimer_v3.npz --model_name model_1_multimer AT1G52380_and_AT1G52380.fasta /global/scratch/users/skyungyong/Software/alphafold-multimer-v2.2.2-080922/Database/pdb_mmcif/mmcif_files/

cat AT1G52380-AT1G52380.fasta
>AT1G52380
MGDSENVQQPSKKRGALKQLSRDNPGLDDDDDSAELESGTFKKASDEVLASRRIVRVKRKEPSAAPVAASNPFAGIQLVPTTAPASTPVGTNAPLAESKLAPAEAVVEDNQKASDIEEGDEVDSKKVDVKDAVGEETEKTKDKDDNHCGKSADVQVAATEVAQMVSCDTNVCNNAVEGTDQTDFPLEKDSGGDQAEKKEKEGNGIEEADKNGDNGAFSSFQQHSSNKNAFTGLASTEASGSSFSFGLVSQDGSTGTGSLFGFGLPSSNSSSIFGATGSSIIKKSEGSGFPPKQEVSTETGEENEKVAFSADSIMFEYLDGGWKERGKGELKVNVSSNDGKARLVMRAKGNYRLILNASLYPEMKLANMDKKGITFACVNSVSEGKEGLSTFALKFKDPTIVEEFRVAIDKHKDSKPMEKAAEKSALPLKTPENSPTATDT
>AT1G52380
MGDSENVQQPSKKRGALKQLSRDNPGLDDDDDSAELESGTFKKASDEVLASRRIVRVKRKEPSAAPVAASNPFAGIQLVPTTAPASTPVGTNAPLAESKLAPAEAVVEDNQKASDIEEGDEVDSKKVDVKDAVGEETEKTKDKDDNHCGKSADVQVAATEVAQMVSCDTNVCNNAVEGTDQTDFPLEKDSGGDQAEKKEKEGNGIEEADKNGDNGAFSSFQQHSSNKNAFTGLASTEASGSSFSFGLVSQDGSTGTGSLFGFGLPSSNSSSIFGATGSSIIKKSEGSGFPPKQEVSTETGEENEKVAFSADSIMFEYLDGGWKERGKGELKVNVSSNDGKARLVMRAKGNYRLILNASLYPEMKLANMDKKGITFACVNSVSEGKEGLSTFALKFKDPTIVEEFRVAIDKHKDSKPMEKAAEKSALPLKTPENSPTATDT
head AT1G52380_and_AT1G52380_model_1_multimer_unrelaxed.pdb
MODEL     1
ATOM      1  N   MET B   1      -8.565 -70.079 -22.331  1.00 42.70           N
ATOM      2  CA  MET B   1      -9.393 -69.112 -21.615  1.00 42.70           C
ATOM      3  C   MET B   1      -8.665 -67.781 -21.460  1.00 42.70           C
ATOM      4  CB  MET B   1      -9.789 -69.655 -20.241  1.00 42.70           C
ATOM      5  O   MET B   1      -7.683 -67.688 -20.721  1.00 42.70           O
ATOM      6  CG  MET B   1     -11.003 -70.569 -20.269  1.00 42.70           C
ATOM      7  SD  MET B   1     -11.648 -70.934 -18.590  1.00 42.70           S
ATOM      8  CE  MET B   1     -13.189 -71.791 -19.019  1.00 42.70           C
ATOM      9  N   GLY B   2      -8.606 -67.101 -22.604  1.00 46.29           N

tail AT1G52380_and_AT1G52380_model_1_multimer_unrelaxed.pdb
ATOM   3260  N   THR B 440     -34.688  39.568  42.140  1.00 57.09           N
ATOM   3261  CA  THR B 440     -34.924  39.376  43.566  1.00 57.09           C
ATOM   3262  C   THR B 440     -35.080  40.719  44.274  1.00 57.09           C
ATOM   3263  CB  THR B 440     -33.782  38.577  44.220  1.00 57.09           C
ATOM   3264  O   THR B 440     -34.344  41.665  43.989  1.00 57.09           O
ATOM   3265  CG2 THR B 440     -34.130  37.095  44.312  1.00 57.09           C
ATOM   3266  OG1 THR B 440     -32.591  38.729  43.436  1.00 57.09           O
TER    3267      THR B 440
ENDMDL
END

Alphafold v2.3.1 seems to behave as expected, generating two chains. Is this behavior expected in fastfold?

@Gy-Lu
Copy link
Contributor

Gy-Lu commented Mar 23, 2023

Hi, can you provide a result produced by AlphaFold2?
We would try to reproduce and solve it.

@s-kyungyong
Copy link
Author

Hi @Gy-Lu! Some of the alphafold outputs are here. I think fastfold is treating the inference as monomer despite --model_preset multimer. In the folder that stores precomputed alignments, there is only one folder for AT1G52380 for Fastfold. For Alphafold, the same alignments are stored separately in msas/A and msas/B. Perhaps this modification made the difference?

@Gy-Lu
Copy link
Contributor

Gy-Lu commented Mar 25, 2023

Hi, I have tried your sequence.
FastFold does not treat the inference as monomer. However, the two sequence have exactly same tag, and the second one would cover the first one.
With changing the second tag to AT1G523802, I got a pdb file not equal to the alphafold ones you offered. But I think it might be caused by different weight(seems you use v3 model?) and the MSA and template searching process.

The alignment thing would be solved lately. But I am not sure the model's performance is correct, one way to examine is to use pre-computed feature and the same weight to see the output.

@s-kyungyong
Copy link
Author

Yes! I was also able to model the homodimers by changing the sequence name and adding the alignments accordingly.

Wouldn't it make more sense for fastfold to automatically do this given the two same tags in a fasta file and --model_preset multimer instead of having the users change the tags?

@Gy-Lu
Copy link
Contributor

Gy-Lu commented Mar 27, 2023

Hi, I am not sure if there is a random in preprocessing. But it seems to be.
The reason for using tag as the MSA folder name is to make pre-computed alignments easy to index.
The situation of two same sequences is a little beyond our design.
I think reusing computed alignments is a good thing. Following this principle, the second sequence should use the first one's alignments. Do you think this idea make sence?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants