Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEQRES writer #4446

Open
smallfishabc opened this issue Feb 1, 2024 · 7 comments
Open

SEQRES writer #4446

smallfishabc opened this issue Feb 1, 2024 · 7 comments
Labels
Format-PDB more information needed Please reply to requests for information or the issue will be closed. new-feature

Comments

@smallfishabc
Copy link

SEQRES is usually needed to fix the missing residue or loops in PDB. Existing Python libraries including BioPython and MDtraj can not write a SEQRES record. If you can add this feature, it will be really helpful.

@IAlibay
Copy link
Member

IAlibay commented Feb 11, 2024

That's an interesting idea @smallfishabc - indeed I do remember other folks asking for a similar thing before.

Could you elaborate on exactly what you would like to see here? How would one fetch the sequence if the input PDB was missing portions of it, etc..?

@smallfishabc
Copy link
Author

Hi,
Technically SEQRES provides a reference point, full sequence of the protein construct, for PDBs with missing residues due to the limit of XRD. However, most Python libraries will omit the SEQRES when save PDB to the file after processing. Thus, they loss an important reference point. I want to add SEQRES back so that we can use Pymol, Modeller, pdbfixer to model the loop.

I have already successfully did it by copying SEQRES from original PDBs downloaded at RCSB of corresponding protein chain with missing residues. I did that use BioPandas. But I may need more tools to manipulate and generate SEQRES from input sequences.

I am not sure whether this is clear enough.

@orbeckst
Copy link
Member

@smallfishabc do you want MDAnalysis to be able to

  • write SEQRES entries to an output PDB file
  • read SEQRES entries from an input PDB file?

For the writing part, how would you envisage supplying any missing SEQRES (what #4446 (comment) was asking), i.e., if your universe contains MDDA---PDK with a gap while the complete sequence is MDDAVRAPDK then how will you be able to complete the SEQRES record with the missing VRA?

@smallfishabc
Copy link
Author

I believe that for most PDBs, the missing part is documented in the SEQRES section of the original PDB file, serving as a reference. This allows us to identify which portion of the sequence is missing, facilitating its subsequent correction. However, existing tools often generate PDB files without including the SEQRES section. This omission poses challenges when attempting to locate missing sequences. Therefore, I propose leveraging MDAnalysis to extract SEQRES information directly from the PDB database and incorporate it as a reference in the new PDB file. Alternatively, users could provide the reference sequence they require, which could then be included in the PDB file.

@orbeckst
Copy link
Member

What should this functionality look like in code, i.e., write Python code for how you'd want to use MDAnalysis to work with SEQRES. Seeing how it would be used will help to better understand if this is something that's easy to do.

@smallfishabc
Copy link
Author

I'll provide an example of how I'm currently addressing this issue. My aim is to rectify missing flexible loops in PDBs to generate and model experimental data.

The input I receive consists of single-chain PDBs generated without SEQRES. To tackle this, I utilize BioPython and BioPandas to retrieve raw PDB data from RCSB. From there, I extract the SEQRES corresponding to the chain of interest, align it with the PDB data, and save it as a fasta file. Subsequently, I adapt pdbfixer to utilize the fasta file as a reference to mend my PDB files.

If MDAnalysis can output a properly formatted SEQRES file using the provided fasta, I wouldn't need to tweak pdbfixer. An enhanced functionality would automatically fetch the SEQRES record based on the PDB's four-digit code and chain number, writing it directly into the output PDB.

@orbeckst orbeckst added the more information needed Please reply to requests for information or the issue will be closed. label Mar 30, 2024
@orbeckst
Copy link
Member

If you could provide some code for how to get SEQRES and how you would want MDAnalysis to work (e.g. u.atoms.write("protein.pdb", seqres=SEQRES) where SEQRES is a data structure you have to tell us about) then it would be easier to decide if this is a feature that is in scope.

At the moment, you're the only person who knows what this feature would be. If you want other people in an open source project to consider working on what you want, you need to convince them that this is a cool feature and how it might work and look like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Format-PDB more information needed Please reply to requests for information or the issue will be closed. new-feature
Projects
None yet
Development

No branches or pull requests

3 participants