Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arc based atom bonds #94

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Arc based atom bonds #94

wants to merge 2 commits into from

Conversation

douweschulte
Copy link
Owner

Fixes #80.

The bonds are now implemented using Arc, meaning they behave in the way I envisioned them to be. The major problem before merging this into the main branch is that I do not like how it works with Hierarchies.

Left to do:

  • Make use of Arc in Hierarchies (preventing a double lookup)
  • Support more bond types
  • Give the code some more love
  • Update the docs

@douweschulte douweschulte changed the title Preliminairy work on Arc based atom bonds (#80) Arc based atom bonds Jul 21, 2022
@douweschulte
Copy link
Owner Author

Okay upon further investigation some things stall the progress a bit.

  • Arc<RefCell<..>> not supported by Rayon
  • Needs Ref and RefMut everywhere
  • Every borrow everywhere could potentially panic (maybe add try_ stuff everywhere?)
  • Hierarchies need to use Ref<Atom>

@hallsopp
Copy link

Hey @douweschulte, Is work on this still active? I would like to utilise it for a project I am working on. Are you in need of any help?

@douweschulte
Copy link
Owner Author

This PR has stalled a bit, mainly because I was not able to figure out how to make it work properly and because I did not have a use case for it myself. So if you could explain what you want to do with it more we could revise the code to support that use case.

For the actual implementation then if you feel courageous you could try to tackle this yourself, but it is quite involved so be warned. Otherwise I could work on an implementation, but that would be after my holidays in a couple of weeks.

@hallsopp
Copy link

Thanks for getting back to me, @douweschulte. I think that the non-par implementation will actually suffice for my use case - it would be cool to see this in the future tho. 😄

@douweschulte
Copy link
Owner Author

Do you need anything more than can be provided by the function that is already in place? With this one you can query all bonds and get the pairs of atoms connected by each bond. If needed this function could be reworked to give mutable references as well I suppose. But this means that you need to keep the pdb struct at hands and loop over all bonds to find the ones you are interested in. But if this is what you need generating a couple of user friendly functions which try to prevent doing useless work should be quite easy.

The main difference with this stalled PR is that in the current version the PDB has a list of all bonds with all ids of each atom involved and when the bonds are queried it uses that to find references to all atoms involved, while this PR tried to change the behaviour to make it so that all atoms have a list of all other atoms they bind saved to their own struct. As this last options involves keeping pointers to other structs in different places in the PDB hierarchy this is fundamentally at odds with the rust philosophy and that is why it is stalled.

Sorry it took a while for me to read back into what I was trying to accomplish here. Let me know what your intended use case is so that we can decide what the best path forward is.

@hallsopp
Copy link

Actually, the change in behaviour for referenced Atoms within each Atom individually would make what I am doing much easier, but I understand the conceptual dilemma with implementing that. I am working on the functionality to build a graph from the PDB object, and while technically possible to do this now, the time complexity is far from optimal with so many lookups.

@douweschulte
Copy link
Owner Author

Internally all bonds are stored as a single list of the IDs of both atoms and the type. Would having access to this list directly make your life easier? This makes it so that you do not have to spend time looking up the atoms beforehand, and would make it possible to get mutable references if you have mut access to the above layer(s) in your code. I might've able to get this in before holidays if neede.

Alternatively given that you get the full list of all bonds if you would make two dictionaries one indexed by the first atom and one by the second solve your problem (or any kind of organised tree layout to make lookups fast)? I this way you will only pay for the atom reference lookup once, and I think I already made it log lookup time thanks to sorting. And the dictionaries together essentially for a full graph of all bonds. But using it like this maybe if structered very differently from how you want to use it.

Also if I remember correctly only 'special' bonds are saved in the pdb files and those are the ones that are present in the output from the mentioned function. If all normal bonds are needed (think: amino acid backbone, internal amino acid bonds) then some way of figuring those out will have to be found first, possibly with thertree spatial lookup, or based on previous knowledge on the amino acids. But that part I did not build in, if you build something like that I would be happy to include that in pdbtbx, as it could be useful for other people as well.

@hallsopp
Copy link

Direct access to the adjacency list might be useful, but please don't prioritise this if you have other things to finish before your time off because the current method would be the same time complexity I think. Creating two hash tables for target + source is an interesting idea tho, I cannot remove the lookups altogether but this would at least make it constant.

Part of the problem with what I am trying to do comes from my adjacency matrix and the fact that currently each node is assigned a sequential usize index - if I was able to map the index to the atom serial number for example, then that would improve runtime, but this comes with it's own pitfalls. To get around this I can build a HashMap on top and map the node index to the serial number, but when I have a molecule with >100k atoms (plus, not every atom has a bond anyway) I don't know if this would scale very well xD. But this is a problem that has to be fixed on my end, not within pdbtbx.

I am pretty new to this, but I believe that the graph DS I am working on could facilitate network analysis for bond inferencing. if this is something you envision as a feature for this crate, I would be more than happy to copy it across at some point and implement some of the algorithms.

As a side point, I am conscious of polluting the PR comments with unnecessary verbiage. Do you (or plan to) have a Discord or something for further discussions around pdbtbx? Or is this ok?

@douweschulte
Copy link
Owner Author

Good to hear that the hashmaps could help you solve some time complexity. We are indeed polluting this thread a bit, opening a separate issue at some point is warranted. I do not have an active discord but if you want to continue the discussion on a bit more informal/private note using my email (in the cargo.toml) is the best way. If you want we could also schedule a call at some point when I am back to talk about the possibilities for your code and inclusion into pdbtbx. For now I am quite interested in the idea of including this here.

Have a nice summer, I will be back in a couple of weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support atom bonds
2 participants