Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBREF and SEQADV to go with SEQRES records #19

Open
jsoerensen opened this issue Mar 17, 2020 · 3 comments
Open

DBREF and SEQADV to go with SEQRES records #19

jsoerensen opened this issue Mar 17, 2020 · 3 comments

Comments

@jsoerensen
Copy link

It is great that the SEQRES records were added. Is there any change you could add the DBREF and SEQADV records as well, since these relate strongly to the SEQRES record?
_struct_ref_seq
_struct_ref_seq_dif
In particular, I'd like the conversion in the to_pdb.hpp file

@wojdyr
Copy link
Member

wojdyr commented Mar 23, 2020

I added conversion of DBREF/DBREF1/DBREF2 to/from _struct_ref and _struct_ref_seq.

I had a request to do this before (in the pdb->mmcif direction). And I was just experimenting with sequence alignment, so I thought it's good time to look into it.
But it took much more time than I expected. I kind of understand why the pdb spec has three DBREF records: longer IDs were not anticipated in the original DBREF, so DBREF1/2 were added later and the original DBREF was kept for compatibility. Not ideal, but it happens.
But why the pdbx/mmcif categories are so poorly designed (as usual)?

Anyway, DBREF is converted now. SEQADV not yet, so I'm leaving this issue open.

@jsoerensen
Copy link
Author

jsoerensen commented Nov 19, 2020

For 2zff.cif, I think the DBREF parser and converter to PDB has a bug.
I get this
DBREF 2ZFF L -4 17 UNP P00734 THRB_HUMAN 328 363
DBREF 2ZFF H 16 247 UNP P00734 THRB_HUMAN 364 622
DBREF 2ZFF I 54 64 UNP P01050 ITH1_HIRME 54 64
But I was expecting (from the PDB)
DBREF 2ZFF L 1H 14N UNP P00734 THRB_HUMAN 328 363
DBREF 2ZFF H 16 247 UNP P00734 THRB_HUMAN 364 622
DBREF 2ZFF I 54 64 UNP P01050 ITH1_HIRME 54 64
As you can see I've lost the insertion codes and the number sequence is incorrect for chain L.
-4 -> 17 also isn't 36 residues long (363-328).

Insert codes are generally a pain... The data is there though in terms of parsing them
_struct_ref_seq.pdbx_seq_align_beg_ins_code
_struct_ref_seq.pdbx_seq_align_end_ins_code

@wojdyr
Copy link
Member

wojdyr commented Nov 20, 2020

That was actually intentional simplification, mapping of the label and author's sequence numbers in the parts of the sequence that were missing from the model was based on guessing.
It should be fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants