Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SASA support #101

Open
OWissett opened this issue May 16, 2023 · 14 comments
Open

Add SASA support #101

OWissett opened this issue May 16, 2023 · 14 comments
Labels
enhancement New feature or request

Comments

@OWissett
Copy link
Collaborator

OWissett commented May 16, 2023

I have developed a Rust port of the C library, freesasa, which performs protein surface area calculations.

I want to know if you think that this can be incorporated into this library? It is already built to support PDBTBX structs.

I am happy to fork this repo and get working on making it more integrated, if people feel this would be a good thing.

@douweschulte
Copy link
Owner

Good work! I have not worked with this particular library before, but from a cursory glance at the documentation it seems to me like a nice small scoped calculation which could fit nicely into pdbtbx. If the code you wrote is really big, then maybe it would make more sense as a separate crate. If the code you wrote needs a lot of dependencies I would make it into a separate feature so users can opt out of compiling this feature in when they do not need it. But that is just personal preference.

Conclusion, I am open for it, and would be happy to get some more details!

@douweschulte douweschulte added the enhancement New feature or request label May 17, 2023
@OWissett
Copy link
Collaborator Author

Yea, so it is not too big but I agree, I think adding it as a feature would be good. Currently, the crate is privately hosted on my research groups GitLab. I will look at integrating at some point soon.

I am keen to get involved in developing this package, as I prefer writing Rust over Python for most of what we do. The only reason to choose python at the moment is the number of packages which are written for it, so this much change!

@OWissett
Copy link
Collaborator Author

Also, I should say, that it isn't really a port (that was the wrong choice of words) but it is FFI bindings to the C library. So it is still using the underlying C library.

You can look at the raw FFI crate freesasa-sys here: https://lib.rs/crates/freesasa-sys

@douweschulte
Copy link
Owner

douweschulte commented May 17, 2023

Perfect! Sounds like a good fit for a new feature for pdbtbx. You are welcome to get involved. As I am not working with protein structures on a daily basis anymore my work on pdbtbx has been sparse recently, I am working on antibody sequencing from serum with mass spec instead ( I saw you are a PhD in an antibody design group, nice coincidence). But I am more than willing to review PRs.
(Also small note, we have public holiday here and I will be away for a couple of days)

@rvhonorato
Copy link
Contributor

hey @OWissett, this is something I'm also interested in.

Did you move forward with the implementation here? I'd be happy to collaborate on that.

@rvhonorato
Copy link
Contributor

ping @OWissett 👀

@OWissett
Copy link
Collaborator Author

ping @OWissett 👀

@rvhonorato Hey sorry didn't see this until now. Been busy with PhD stuff.

I'm not currently working on this right now, you're welcome to give it a go.

If you go on my repos, I have the freesasa-rs crate on there. It needs some restructuring potentially, but feel free to have a go at implementing it in the pdbtbx.

@maxall41
Copy link

maxall41 commented Feb 12, 2024

IMO pdbtbx should be kept as a pure PDB and mmCIF parser without additional features to avoid bloat especially anything that depends on packages outside of the Rust ecosystem. On a related note, I recently wrote a pure Rust implementation of the Shrake-Rupley algorithm for computing SASA which can be found here.

@OWissett
Copy link
Collaborator Author

I have now released an early alpha version of the freesasa-rs library on crates.io.

@maxall41 I see that your version only has the ability to calculate the SASA on a per atom basis? Do you aim to add the ability to get the SASA on for residues and chains easily?

@OWissett
Copy link
Collaborator Author

Also, @maxall41 I think it would be fine to add to the library as an optional feature. Since the crate already provides more features than simply parsing, such as the R* tree atom search.

It depends on the vision of @douweschulte for this crate.

I think that potentially what is needed in this space to increase the easy of working with PDBs is to have functionality similar to that of biopython, in a single crate (which feature flags hiding what you don't need). Maybe we can look at integrating with https://github.com/rust-bio/rust-bio

@maxall41
Copy link

maxall41 commented Apr 11, 2024

@maxall41 I see that your version only has the ability to calculate the SASA on a per-atom basis? Do you aim to add the ability to get the SASA on for residues and chains easily?

You can easily calculate the per-residue SASA values by just summing the atom SASA values for each residue. Though I may add this internally just for ease of use.

Note: One thing to consider with the previous approach is that it is not deterministic for residues across different proteins because the number of atoms resolved in the structure for each residue may be different, but I haven't seen an implementation that does this differently (e.g: see https://github.com/biopython/biopython/blob/master/Bio/PDB/SASA.py) It would also theoretically be more performant to calculate SASA on a residue level instead of an atom level if you only needed it at a resiude level, but I don't think that optimization is really necessary as my implementation is already quite fast.

@OWissett
Copy link
Collaborator Author

I wasn't meaning to perform the calculation at the residue level, but be able to present it easily for overall structures, chains, and residues. So I agree with you, that maybe just adding these are methods to your library would be good.

Have you done a speed comparison with the pure C FreeSASA library?

Also, do you support report SASA values for polar, apolar, sidechain, main chain, etc... like how FreeSASA does?

@douweschulte
Copy link
Owner

I in general approve of including more advanced features within the crate. When this encompasses features that might hamper the compile time of users I do like to put them behind features (most often this slims the number of dependencies down).

Besides that, @OWissett brought up the option of including in biorust. If they are up for it I think we could discuss it further. In the best case we make it easy for anyone to work with biological stuff in Rust and being part of a larger crate (not to mention having more maintainer potential) could be good.

@maxall41
Copy link

maxall41 commented Apr 12, 2024

I wasn't meaning to perform the calculation at the residue level, but be able to present it easily for overall structures, chains, and residues. So I agree with you, that maybe just adding these are methods to your library would be good.

Have you done a speed comparison with the pure C FreeSASA library?

Also, do you support report SASA values for polar, apolar, sidechain, main chain, etc... like how FreeSASA does?

I was able to compute SASA values for A0A2K5XT84-F1 (AlphaFold) in 40.06225ms. Doing the same with freesasa took 94 milliseconds. So it seems like my library is a good bit faster than freesasa. Used flags:

[profile.release]
lto = true
codegen-units = 1

I also finished implementing the ability to set the desired level (Atom,Residue,Chain,Protein) and you can now use it (V2.0.0 and higher). Il try and implement separate apolar and polar return values later.

EDIT (Apr 30 2024): rust-sasa now returns separate polar and apolar totals when using the SASALevel::Protein option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants