Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge Case in Definition of "Hydrogen Bond" vs "Other" Contact Type #85

Open
RMCrean opened this issue Dec 10, 2021 · 2 comments
Open

Edge Case in Definition of "Hydrogen Bond" vs "Other" Contact Type #85

RMCrean opened this issue Dec 10, 2021 · 2 comments

Comments

@RMCrean
Copy link

RMCrean commented Dec 10, 2021

First, thanks for this very nice tool, it's been very useful for my work. I think I have found an "edge case" though with the "determine_ctype" function inside Biochemisty.py.

I have being running PyContact on my replicas separately before later combining them.

In the image below are two contacts identified from a merged trajectory (1500 frames each). The only difference between the contact type is that one is labelled "Other" and in the other as a "hydrogen bond". (I have made my own script to process the PyContact output for how I would like the labels to look, which is why the labelling format is non-standard).

image

Clearly these are the same contact but in one case PyContact labels it a Hbond, and in the other as "Other". When I visualise the trajectory the interactions are clearly very largely "Other" in both trajectories but in a very small number of frames of one trajectory the two residues come into hydrogen bonding distance.

From looking at the determine_ctype function and then the hbondFramesScan function it appears that you need a single frame to have a hydrogen bond to consider the contact type as a hydrogen bond. Can you confirm if this is the case?

To be clear, I'm not requesting you fix this because I appreciate this is kind of a rare edge case, I just want to make sure I understand correctly so I can correctly handle it properly in my analysis.

Thanks!

@maxscheurer
Copy link
Owner

Hi @RMCrean, thanks for reporting this.

Can you confirm if this is the case?

Yes, I just looked at the code in question again and you're absolutely right.

I have not worked on PyContact over the past years because my own research has "moved away" from MD simulations... If you'd like to propose fixes, please feel free to open a PR any time!

@RMCrean
Copy link
Author

RMCrean commented Dec 10, 2021

Hi @maxscheurer, thanks for confirming this is the case and no worries at all that you have moved on to other work.

And thanks for the invite to make a PR for this, although I'm not entirely sure what the optimal (if there is one) solution would be here.

For me, the issue was inconsistency across runs, which I can solve (just tested now) by merging all my runs together into one trajectory and running PyContact calculations on small sections of the protein (and then merge these all back together again later).

We could update the code to change the cutoff from 1 frame to some fraction of frames with a Hbond but that wouldn't solve the consistency issue (I can imagine it might make the situation worse), and of course whatever fraction chosen would be arbitrary. So perhaps it is better to just be aware of it - unless you can think of a way to handle it?

(Sorry to go semi-off topic) Regarding PRs though, I do have a python script (run_pycontact.zip, attached as zip as GitHub wouldn't accept it otherwise) which other users may find useful which I can submit as a PR if you think it is worthwhile. I had issues with writing the contacts directly to file with the writeContactDataToFile() command so this script lets me run PyContact and process the data without having to save a session file and then load the GUI to write the contacts out.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants