SEQRES writer #4446

smallfishabc · 2024-02-01T20:20:40Z

SEQRES is usually needed to fix the missing residue or loops in PDB. Existing Python libraries including BioPython and MDtraj can not write a SEQRES record. If you can add this feature, it will be really helpful.

IAlibay · 2024-02-11T09:57:12Z

That's an interesting idea @smallfishabc - indeed I do remember other folks asking for a similar thing before.

Could you elaborate on exactly what you would like to see here? How would one fetch the sequence if the input PDB was missing portions of it, etc..?

smallfishabc · 2024-02-12T18:38:07Z

Hi,
Technically SEQRES provides a reference point, full sequence of the protein construct, for PDBs with missing residues due to the limit of XRD. However, most Python libraries will omit the SEQRES when save PDB to the file after processing. Thus, they loss an important reference point. I want to add SEQRES back so that we can use Pymol, Modeller, pdbfixer to model the loop.

I have already successfully did it by copying SEQRES from original PDBs downloaded at RCSB of corresponding protein chain with missing residues. I did that use BioPandas. But I may need more tools to manipulate and generate SEQRES from input sequences.

I am not sure whether this is clear enough.

orbeckst · 2024-03-25T19:22:47Z

@smallfishabc do you want MDAnalysis to be able to

write SEQRES entries to an output PDB file
read SEQRES entries from an input PDB file?

For the writing part, how would you envisage supplying any missing SEQRES (what #4446 (comment) was asking), i.e., if your universe contains MDDA---PDK with a gap while the complete sequence is MDDAVRAPDK then how will you be able to complete the SEQRES record with the missing VRA?

smallfishabc · 2024-03-26T17:18:41Z

I believe that for most PDBs, the missing part is documented in the SEQRES section of the original PDB file, serving as a reference. This allows us to identify which portion of the sequence is missing, facilitating its subsequent correction. However, existing tools often generate PDB files without including the SEQRES section. This omission poses challenges when attempting to locate missing sequences. Therefore, I propose leveraging MDAnalysis to extract SEQRES information directly from the PDB database and incorporate it as a reference in the new PDB file. Alternatively, users could provide the reference sequence they require, which could then be included in the PDB file.

orbeckst · 2024-03-26T18:37:38Z

What should this functionality look like in code, i.e., write Python code for how you'd want to use MDAnalysis to work with SEQRES. Seeing how it would be used will help to better understand if this is something that's easy to do.

smallfishabc · 2024-03-26T18:47:26Z

I'll provide an example of how I'm currently addressing this issue. My aim is to rectify missing flexible loops in PDBs to generate and model experimental data.

The input I receive consists of single-chain PDBs generated without SEQRES. To tackle this, I utilize BioPython and BioPandas to retrieve raw PDB data from RCSB. From there, I extract the SEQRES corresponding to the chain of interest, align it with the PDB data, and save it as a fasta file. Subsequently, I adapt pdbfixer to utilize the fasta file as a reference to mend my PDB files.

If MDAnalysis can output a properly formatted SEQRES file using the provided fasta, I wouldn't need to tweak pdbfixer. An enhanced functionality would automatically fetch the SEQRES record based on the PDB's four-digit code and chain number, writing it directly into the output PDB.

orbeckst · 2024-03-30T16:57:00Z

If you could provide some code for how to get SEQRES and how you would want MDAnalysis to work (e.g. u.atoms.write("protein.pdb", seqres=SEQRES) where SEQRES is a data structure you have to tell us about) then it would be easier to decide if this is a feature that is in scope.

At the moment, you're the only person who knows what this feature would be. If you want other people in an open source project to consider working on what you want, you need to convince them that this is a cool feature and how it might work and look like.

orbeckst added Format-PDB new-feature labels Mar 25, 2024

orbeckst added the more information needed Please reply to requests for information or the issue will be closed. label Mar 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SEQRES writer #4446

SEQRES writer #4446

smallfishabc commented Feb 1, 2024

IAlibay commented Feb 11, 2024

smallfishabc commented Feb 12, 2024

orbeckst commented Mar 25, 2024

smallfishabc commented Mar 26, 2024

orbeckst commented Mar 26, 2024

smallfishabc commented Mar 26, 2024

orbeckst commented Mar 30, 2024

SEQRES writer #4446

SEQRES writer #4446

Comments

smallfishabc commented Feb 1, 2024

IAlibay commented Feb 11, 2024

smallfishabc commented Feb 12, 2024

orbeckst commented Mar 25, 2024

smallfishabc commented Mar 26, 2024

orbeckst commented Mar 26, 2024

smallfishabc commented Mar 26, 2024

orbeckst commented Mar 30, 2024