Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The sequence length does not match the number of residues #538

Open
DoubleSheep2 opened this issue Aug 17, 2023 · 7 comments
Open

The sequence length does not match the number of residues #538

DoubleSheep2 opened this issue Aug 17, 2023 · 7 comments

Comments

@DoubleSheep2
Copy link

DoubleSheep2 commented Aug 17, 2023

I'm trying to create a coarse-grained model for a virus with an atomic model consisting of 60 chains, each having around 500 amino acids. When I use the following command for coarse-graining,

martinize2 -f particle_forCMD_minimize.pdb -o particle_forCMD_minimize.top -x particle_forCMD_minimize_cg.pdb -ff martini3001 -p backbone -maxwarn 1 -mutate HSD:HIS -mutate HSP:HIH -dssp /home/emuser/miniconda2/envs/dssp/bin/mkdssp

I get an error saying, "The sequence length does not match the number of residues. The sequence has 476 elements for 477 residues." This error occurs during the dssp step. I believe this error isn't related to the input model because when I reduced the number of chains, the command worked fine. How can I resolve this issue?

@pckroon
Copy link
Member

pckroon commented Aug 18, 2023

How many residues do you have in your system exactly? How many residues does DSSP find/annotate if you run it on particle_forCMD_minimize.pdb?
If this doesn't shed light, try running with -v. This will preserve any intermediate files, such as the one that we feed to dssp.

@DoubleSheep2
Copy link
Author

The entire virus consists of 60 identical capsid protein monomers, with each chain containing 477 amino acids (aa.129-605), totaling 28,620 amino acids. In debug mode, I inspected the last dssp-generated pdb file (34th) before the error. It appears quite unusual - the 1-33 chains start at position 129 and end at 605, while this specific chain (34th) starts at position 544, goes up to 605, then resets and starts from position 129.
The order of the chains in PDB file does not affect the occurrence of the error when processing the 34th chain. Hence, could this be due to the large system size causing the program to encounter issues similar to stack overflow problems?
running environment: mkdssp v3.0.0 (conda) and martinize2 v0.9.3 (conda).

@pckroon
Copy link
Member

pckroon commented Aug 18, 2023

The order of the chains in PDB file does not affect the occurrence of the error when processing the 34th chain. Hence, could this be due to the large system size causing the program to encounter issues similar to stack overflow problems?

No I don't think so. If it did I also think they would show up differently.

Does it work if you remove the afflicted/suspect chain from your input file? Does it have missing atoms in critical spots? What does the dssp output look like if you feed that specific DSSP input file to it?

@DoubleSheep2
Copy link
Author

I think I've identified the cause of the error, which might be related to atom serial number. Due to limitations in the PDB format, atom numbering can't go beyond 99999, and my system has a total of 230,000 atoms. When I adjusted all atom serial numbers to 99999, dssp threw error when processing the 7th chain. However, when I cyclically numbered atoms from 1 to 99999, the error occurred when processing the 42nd chain. Therefore, for larger systems, is there a preprocessing approach that can be employed?

@pckroon
Copy link
Member

pckroon commented Aug 21, 2023

Hmmn, I know for sure we've seen this issue before, but I can't remember the fix/workaround. How do the atom numbers look for the 42nd chain? It may be a reasonably quick solution to renumber the atoms when writing the PDB for dssp.

@DoubleSheep2
Copy link
Author

Thank you so much for your assistance. I tried numbering each chain's atoms starting from 1, and it resolved the issue. Even in a system with 140 chains, no errors occurred. Hopefully, this solution can help others as well.

@pckroon
Copy link
Member

pckroon commented Aug 22, 2023

Thanks for confirming that fixes it (and I'm happy you found a workaround). I'll put it on the list to have the DSSP processor renumber atoms before writing the dssp input pdb files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants