RMSD with multicore and timeout functionality #113

Jnelen · 2024-03-11T21:46:19Z

As we discussed in #105: here is my first shot at implementation of RMSD functions with multicore and timeout functionalities.

For now, I kept the rmsd_timeout function public, however we could also just make it private. This function supports a molref with multiple mols, something that the multicore function doesn't support directly (everything is done pairwise here). Although I guess there is an easy workaround for the user?
Maybe it's even better to implement this ourselves: if only one mol is supplied as the ref mol, extend the list to match the length of the mol list? (like this [refmol]*len(mols)). This would also give each molecule a seperate timeout. Having the possibility of a mollist in the rmsd_timeout function also made me return the raw result of rmsdwrapper (a list) instead of the value (because otherwise only one value would be returned). So if we make it private, it could clean up things and also emulate the regular rmsdwrapper functionality better! (I just came up with this while writing this description, if you like it I will implement it tomorrow!)

I also still need to add tests. I guess we could add a test for a timeout, a test for matching mols and molrefs list, ... But multicore might be harder to test?

Any feedback welcome as always!

Note: I intentionally kept all the new funtions and imports together at the bottom so it's more clear to see what's added. This makes Flake8 doesn't pass. MyPy also doesn't pass yet, but I'll get to fixing that tomorrow as well!

…tionality Note: MyPy and Flake8 doesn't pass (yet)

codecov · 2024-03-11T21:48:50Z

Codecov Report

Attention: Patch coverage is 94.11765% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 98.65%. Comparing base (da6be7f) to head (5729f6b).
Report is 11 commits behind head on develop.

Additional details and impacted files

RMeli

Thanks @Jnelen for the nice contribution! This is just a first quick pass (it's late...), I'll have a deeper look tomorrow.

spyrmsd/rmsd.py

RMeli · 2024-03-11T22:43:19Z

@Jnelen sorry, I clicked "Update Branch" without too much thinking. You'll have to pull before pushing further changes, otherwise you will get an error. Sorry about that!

General cleanup and improvements Co-authored-by: Rocco Meli <r.meli@bluemail.ch>

spyrmsd/rmsd.py

Make rmsd_timeout a private function MyPy still complains when comparing num_workers with os.cpu_count()

it now supports mol-mol, mol-list and list-list as input. In the case of mol-list, all calculations are performed separately to take advantage of multiple cores

RMeli

Please test the suggested changes. I'm a bit short in time these days so I did not test them myself, sorry.

spyrmsd/rmsd.py

Co-authored-by: Rocco Meli <r.meli@bluemail.ch>

spyrmsd/rmsd.py

Refactor to prmsdwrapper Some improvements to typing

Jnelen · 2024-03-13T22:16:46Z

For the Value ctype I picked a regular, single precision float. However we could also use a double precision float if you think this is more appropriate!

Jnelen · 2024-03-14T20:21:43Z

Thanks! Please don't waste too much time on mypy issues. It's supposed to be helpful, not annoying. If it becomes too much, we can simply turn it off for the offending line(s) and deal with it another time.

I just suppressed the 2 remaining MyPY errors. One of them should hopefully get fixed by MyPy (see the earlier issue I referred to), the other is the problem that os.cpu_count() can return None sometimes, which would give problems when using the min(num_workers, os.cpu_count()). I tried your patch, but it was still complaining.

Functionally I think it's basically how we want to have, but perhaps some parts of the implementation can still be cleaned up? Besides that, I suppose some tests should be added as well? The timeout feature should be relatively easy to test (by using one of the "problematic" compounds that made me start looking into this in the first place. But i'm not sure how to properly test the multicore feature?

By the way, there is no rush in getting this finished quickly, so please only review it if you actually have the time!

RMeli · 2024-03-14T20:42:41Z

Besides that, I suppose some tests should be added as well?

Yes, we need tests. =)

The timeout feature should be relatively easy to test (by using one of the "problematic" compounds that made me start looking into this in the first place).

Yes. I think the worst one you reported is a great candidate.

But i'm not sure how to properly test the multicore feature?

I think GitHub Actions have access to multiple cores, but I should double check. In any case, the test would be on correctness.

Eventually we could also add a benchmark test, but it's not a priority IMO. If you have some results from your work (where you see an improvement), you can just post the timings here for future reference. But not a priority, the important thing is that it's correct.

By the way, there is no rush in getting this finished quickly, so please only review it if you actually have the time!

Thanks. You are doing great work, so I don't want it to stall. ;)

spyrmsd/rmsd.py

RMeli · 2024-03-14T20:52:14Z

spyrmsd/rmsd.py

+    # Cast the molecules to lists if they aren't already
+    if not isinstance(molrefs, list):
+        molrefs = [molrefs]
+    if not isinstance(mols, list):
+        mols = [mols]
+
+    # Match the length of the molref
+    if len(molrefs) == 1 and len(molrefs) < len(mols):
+        molrefs = molrefs * len(mols)


What we want to do is to parallelise on graph isomorphisms, since computing the RMSD is rather fast with NumPy vectorisation. Therefore, I think the use case of multiple references each with multiple different poses (i.e. in the case of docking), is an important use case that is not covered, is it?

The current behaviour is like this: If a user inputs 1 molref and a list of mols ([mol1, mol2, ...] to compare it to (so this could be different docked poses), what happens is that multiple rmsdwrapper calculations are performed across different cores like this:

rmsdwrapper(molref, mol1) #core 1 rmsdwrapper(molref, mol2) #core 2 ...

So this is just using an embarrassingly parallel approach using the "regular" rmsdwrapper, the graph isomorphisms calculations aren't even really being parallelized explicitly. Thinking more about it, I'm honestly not even sure how this interacts with the graph caching either (because of the multicore aspect).
My implementation had mostly my own use-case in mind, which I thought would generalize decently well. Basically I wanted to accelerate the workload of calculating the RMSD of a few ligandposes from different docking methods after a virtual screening run. However, considering your insights, maybe my approach was/is too simplistic and needs to be improved. I do think this would introduce even more complexity, so we will need to think things through very carefully (so basically continue at it like we have been since the start haha).

spyrmsd/rmsd.py

Jnelen · 2024-03-17T22:41:56Z

I added some very basic tests for the prmsdwrapper function. (I basically copied the tests with the rmsdwrapper function and used the prsmdwrapper function instead). I also added a test for the timeout function. However one thing I noticed is that for some reason this runs twice: once with a True and once with a False variable. When the tests were giving errors I got this:

FAILED tests/test_rmsd.py::test_prmsdwrapper_single_molecule_timeout[True] - assert nan is nan
FAILED tests/test_rmsd.py::test_prmsdwrapper_single_molecule_timeout[False] - assert nan is nan

I fixed the error of the assert, but the test still runs twice. I'm still confused since there isn't a parametrize, or even an input variable. I couldn't immediately figure it out, but I assume it's getting this from some earlier statement?

Ofcourse there is still some work left improving the actual prmsdwrapper function, but I suppose some basic tests can also add sanity checks to ensure we're not breaking stuff.

Co-authored-by: Rocco Meli <r.meli@bluemail.ch>

RMeli · 2024-03-18T07:58:11Z

Thanks @Jnelen! I'm a bit swamped this week too, but I'll try to have a look and reply ASAP. Sorry for the delays.

I'm still confused since there isn't a parametrize, or even an input variable. I couldn't immediately figure it out, but I assume it's getting this from some earlier statement?

Yes, this is a bit confusing. I think the two runs are coming from the following fixture, which is automatically applied to all the tests of the file (autouse=True).

spyrmsd/tests/test_rmsd.py

Lines 11 to 15 in 08764e1

    
           @pytest.fixture(autouse=True, params=[True, False]) 
        
           def lambda_max_failure(monkeypatch, request): 
        
               """ 
        
               Monkey patch fixture for :code:`lambda_max` function to simulate convergence 
        
               failures.

It's used to trigger a separate code path deep inside the minimized RMSD calculation. So the two runs are normal and you should not worry about them, but great you look this closely at what's happening!

Jnelen · 2024-03-18T10:56:43Z

Thanks @Jnelen! I'm a bit swamped this week too, but I'll try to have a look and reply ASAP. Sorry for the delays.

No problem at all. Actually, I will probably also be quite inactive until Wednesday next week. I should still be able to respond to some comments, but I probably won't be able to change or test any new code until then. Just as a heads-up!

I'm still confused since there isn't a parametrize, or even an input variable. I couldn't immediately figure it out, but I assume it's getting this from some earlier statement?

Yes, this is a bit confusing. I think the two runs are coming from the following fixture, which is automatically applied to all the tests of the file (autouse=True).

spyrmsd/tests/test_rmsd.py

Lines 11 to 15 in 08764e1

@pytest.fixture(autouse=True, params=[True, False])

def lambda_max_failure(monkeypatch, request):

"""

Monkey patch fixture for :code:`lambda_max` function to simulate convergence

failures.

It's used to trigger a separate code path deep inside the minimized RMSD calculation. So the two runs are normal and you should not worry about them, but great you look this closely at what's happening!

Ah that explains it, thanks for the clarification!

Jnelen · 2024-03-28T12:36:06Z

I'm back from my break! Right before that I actually updated my use-case (ESSENCE-Dock) to work with a time-out and multicore, but I ended up using Pebble since this handles both at the same time. For my specific use-case I think it was the most elegant solution, and I use a Singularity image anyway, so I didn't mind adding an additional dependency.

However this got me thinking if we should change the implementation of the current multicore and timeout implementation. Maybe I had my (specific) use-case too much in mind? It definitely is functional, and will work just fine to process a large number of molecules across different CPUs. However the calculation of graph-isomorphisms isn't currently accelerated, but for that I assume we need to look into specific solutions for the different backends (since I assume they can take care of the multicore support)? However, I could also see how this could lead to problems with the current multicore implementation, since it will want to use multiple cores in each process? That's why I'm wondering if we should make larger changes to the current version.

Finally as for a benchmark, I haven't ran a proper one yet, but I'll try to run one later with the current implementation.

Jnelen · 2024-03-30T17:24:58Z

Hi,

I finally got around to work on a simple benchmark, but I have started to notice quite a notable difference between running the calculations the first time versus subsequent times. Here is a concrete example for the same dataset.
Running it the first time:

real    0m30.536s
user    0m17.145s
sys     0m2.811s

Versus the second (and subsequent) time(s):

real    0m11.677s
user    0m11.957s
sys     0m2.657s

Do you know why this is? I'm not sure, but I think this is because of the loading of the molecules, and that in subsequent runs these are maybe cached? Also note the discrepancy between the real time and the user time in the first run. I tried several different datasets, and this pattern seems to be very common. Do you know what could be happening?

I will try to follow-up later with some concrete "benchmarks" and more context!

Jnelen · 2024-03-30T19:42:57Z

Alright, I did some testing, and the performance of the parallel implementation is a lot worse than I thought it would be to be honest.. In the edge-cases tests I was doing it seemed good, because even if some "problematic" molecules were present it would still finish in a reasonable amount of time, where as with the "regular" implementation it would get killed or take a very long time. However, in more "normal" scenarios, the current prmsdwrapper seems to degrade the performance by a lot. I used some docking examples from DUD-E to test. This example is for HS90A, where I used the LeadFinder (LF) and Gnina (GN) results to calculate the RMSD for 4938 compounds. You can find the scripts I used here: benchmark_scripts.zip.

Anyway, here are my numbers:

Regular (rmsdwrapper)

real    0m11.990s
user    0m12.207s
sys     0m2.473s

Parallel (1 Core)

real    2m56.928s
user    0m25.358s
sys     2m39.194s

Parallel (4 Cores)

real    0m51.843s
user    0m24.107s
sys     2m45.873s

Parallel (8 Cores)

real    0m29.346s
user    0m23.753s
sys     2m40.650s

Parallel (16 Cores)

real    0m18.365s
user    0m21.712s
sys     2m37.021s

Something else I noticed is that error handling in the prmsdwrapper should be improved as well. To me it seems like in the current implementation it seems to almost never really be worth it to use prmsdwrapper, except when there are many edge-cases that can take a very long time or even kill the python process. However at this point it might be better for the user to implement specific changes/checks? I think our concept of the prmsdwrapper is still a good idea, but my implementation seems to be performing very suboptimal.. I would love to have a clean and efficient prmsdwrapper function merged, but this definitely isn't it. I also have many other projects coming up, so unfortunately I think that I will have less time to allocate for this as well. What do you think?

Jnelen · 2024-04-10T09:06:07Z

Thanks @Jnelen! I'll review the main function ASAP, sorry for the delay.

There is no rush at all! Please only do it when you have the time!

Yes, it needs to be tested in CI. Since it's an optional dependency we need CI configurations both with and without. But since your design is completely orthogonal, I don't think we need to add too many. I'd suggest to use pebble in the RDKit configurations and not use it with OpenBabel configurations, so that we don't expand the CI matrix. What do you think? The files to add it to would be spyrmsd-test-rdkit-all.yaml and spyrmsd-test-rdkit-nogt.yaml.

Perfect, I just added it!

RMeli

Thanks @Jnelen, looking good! I left some comments, but we are getting there, thanks for your patience.

spyrmsd/parallel.py

RMeli · 2024-04-10T20:10:03Z

spyrmsd/parallel.py

+    num_workers = min(num_workers, os.cpu_count()) if os.cpu_count() is not None else 1  # type: ignore[type-var]
+
+    # Cast the molecules to lists if they aren't already
+    if not isinstance(molrefs, list):


Parallelisation is performed over graph isomorphisms, so I don't think a single reference structure is particularly interesting to have. Is this for the timeout functionality only? If yes, I might be worth adding a note in the docstring, explaining the two (non-exclusive) functionalities of this function and that parallelisation only makes sense for multiple molecules.

Maybe it adds useless overhead, but it might be worth checking num_workers and raising a warning if a single molecules is passed with num_workers > 1? But maybe it's not worth it, because it would prevent to set num_workers = None by default.

Yes, if there is only one structure it makes less sense to use this method, unless you want to use the timeout feature like you mentioned. Another scenario could be where you want to check just one structure against a lot of reference structures, which would also be executed in parallel. (For example if you docked a structure 100 times, and want to compare the top structure directly with all the other compounds, it would treat every RMSD calculation seperately and run in parallel across multiple cores). In this case, it would basically match the single input structure to all the reference molecules for you, and calculate it seperately using rmsdwrapper. So rmsdwrapper(mol, refmol1), rmsdwrapper(mol, refmol2), ... is computed across all the available cores.

Another scenario could be where you want to check just one structure against a lot of reference structures, which would also be executed in parallel.

Yes, but in this case you would have to re-compute the graph isomorphism each time, which might be the most expensive thing. I'm starting to wonder if these two cases (one reference and many many corresponding structures, or multiple references with few corresponding structures) are different enough to distinguish them? The first case would definitely benefit to compute the graphs isomorphisms once, and then parallelise over the RMSD calculations (although these are quite fast, so the overhead of parallelisation might be too much...).

What do you think? Do you have a feel about how much is the parallel overhead compared to just the RMSD calculation?

Yeah so the parallel method is "stupid" in the sense that it re-computes the graph isomorphism each time (which I agree is not efficient). However I think that for this there would need to be made some more changes (maybe cache it somehow so it can be reused?). Also the graph isomorphism calculations aren't perfomed in a multicore manner either. In an ideal world the isormorphisms would be calculated using all the assigned num_workers, and be cached. After that the RMSDs could be computed across all num_workers (single core) and reuse the cached isomorphisms until all calculations are finished. However I don't have the bandwidth to work on that. The initial goal for me was to have a function with timeout function so it can run more stable on libraries that have "problematic" molecules (like Drugbank).
Also for your question about overhead: with pebble there still is some overhead, but compared to spawning the processes manually as I did in the first try, it looks to be relatively small (but ofcourse not 0). I would say around a 10-20% overhead maybe? Here are the benchmarks I shared before:

regular rmsdwrapper function: 12.00s

pebblermsdwrapper with num_worker=1: 14.27s

pebblermsdwrapper with num_worker=2: 7.69s

pebblermsdwrapper with num_worker=4: 4.93s

So using 2 Cores already gives you a faster performance than 1 (which in my opinion seems already worth it, especially with the timeout feature)

However I think that for this there would need to be made some more changes (maybe cache it somehow so it can be reused?).

Yes, this is how a standard RMSD calculation already works, if you are passing a single reference and many structures to compare it with. The isomorphism is computed only once and cached:

spyrmsd/spyrmsd/rmsd.py

Lines 239 to 240 in 62c5df3

cache: bool

Cache graph isomorphisms

So one would have to use _rmsd_isomorphic_core in a similar way in order to implement parallelisation over RMSD calculations with a single graph isomorphism.

However I don't have the bandwidth to work on that.

Totally understandable. And so that there are no misunderstanding, I'm not suggesting to implement both directly, but to narrow the scope of the one you are implementing (parallelisation on graph isomorphism, and timeout), and leave the other less interesting case (parallelisation over RMSD calculations with the same isomorphism) as a TODO.

Also for your question about overhead: with pebble there still is some overhead, but compared to spawning the processes manually as I did in the first try, it looks to be relatively small (but ofcourse not 0).

Yes, I was wondering how the overhead compares with RMSD calculations only, instead of graph isomorphism + RMSD. The former is vectorised with numpy, so it should be very fast (and the overhead would have much more impact).

spyrmsd/parallel.py

tests/test_parallel.py

Improve prmsdwrapper feedback

Jnelen · 2024-04-11T17:00:48Z

I just pushed an update where I tried to address most of your comments.
I think the main thing still is fixing the (potential) issues with chunksize >1 and adding more tests for prmsdwrapper (especially regarding chunksize..). I also opened an issue in the pebble repository to see if he can give some input (there was no discussion page), since it's quite confusing. In the absolute worst case scenario where we can't make it work properly we can restrict the chunksize I suppose, and make it flexible once we find a (proper) fix

add shorter warning when not all compounds were successfully processed

Jnelen · 2024-04-14T19:38:20Z

Alright, I made some minor changes (I added a check as you suggested), and also changed the printing to a more compact warning.
I'm thinking maybe add a test where the chunksize is tested (with and without a timeout)?
What other concrete changes should be made in your opinion for our redefined scope?

RMeli · 2024-04-15T15:31:21Z

I'm thinking maybe add a test where the chunksize is tested (with and without a timeout)?

That would be great!

What other concrete changes should be made in your opinion for our redefined scope?

I think we can simplify the interface to accept a list of reference and a list or a list of a list of molecules?

    molrefs: List[molecule.Molecule],
    mols: Union[List[molecule.Molecule], List[List[molecule.Molecule]]],

Is that compatible with your use case? Otherwise we can keep it as-is, and I can eventually clean it up later if needed.

Jnelen · 2024-04-15T20:01:03Z

I'm thinking maybe add a test where the chunksize is tested (with and without a timeout)?

That would be great!

Alright, I'll try to think about it and see what I can come up with.

What other concrete changes should be made in your opinion for our redefined scope?

I think we can simplify the interface to accept a list of reference and a list or a list of a list of molecules?
    molrefs: List[molecule.Molecule],
    mols: Union[List[molecule.Molecule], List[List[molecule.Molecule]]],
Is that compatible with your use case? Otherwise we can keep it as-is, and I can eventually clean it up later if needed.

Yes it is, but in that case if you pass a single molecule, the user just has to pass it as a list (which shouldn't be a problem if it's documented properly).

I also got a response on the issue I opened on Pebble. When an error or a timeout occurs, the whole chunk should fail according to the author. This is also initially how I thought it worked, so I will try to test it again with a fresh mind (and maybe also just a simplified example without spyrmsd) to be sure and debug it. If that's the case, I can hopefully add the flexibility of the chunksize parameter again.

However I'm quite busy this week, so not sure when I will get around to it. Maybe I find some time tomorrow, otherwise It'll be later this week.

RMeli · 2024-04-15T20:07:21Z

Thanks for the update @Jnelen, sounds great.

However I'm quite busy this week, so not sure when I will get around to it. Maybe I find some time tomorrow, otherwise It'll be later this week.

No problem, don't sweat it. There is no rush, and there is not timer attached to an open PR! ;) spyrmsd isn't going anywhere anytime soon (I hope!).

Check if proper warnings are raised (on some compound failures as well as setting chunksize and timeout) Check if setting different chunksize works in test scenario (with and without timeout)

Jnelen · 2024-04-22T20:01:17Z

I just added some more tests for prmsdwrapper. Mainly checking if proper warnings are raised, and that different chunksizes work (when no timeout is set). What else do you feel like should be improved before we can merge? I feel we're mostly there, but would like your input!

Also for the pebble interaction with errors and timeouts, I updated my initial issue with a simpler minimal working example. I'm curious what the response will be, but for now I'd like to try and get this PR finished up!

RMeli · 2024-04-29T08:17:13Z

@Jnelen apologies for the massive delay here! The kid was sick, then I was away for work, and finally I was sick too. I should have some time to have a proper and final look at this towards the end of the week.

Jnelen · 2024-04-29T19:38:35Z

@Jnelen apologies for the massive delay here! The kid was sick, then I was away for work, and finally I was sick too. I should have some time to have a proper and final look at this towards the end of the week.

No need to apologize friend, hope you and your kid are feeling better now! Take your time and I'll get back to it when you have found the time to take a look!

RMeli

Just a few comments while I'm slowly getting back into this. I'm really sorry for the massive delay!

spyrmsd/parallel.py

Jnelen · 2024-05-13T20:12:25Z

Just a few comments while I'm slowly getting back into this. I'm really sorry for the massive delay!

Don't worry about it! I'll try to take a detailed look at your comments tomorrow evening.

Make warning message more comprehensive

Jnelen · 2024-05-16T20:00:02Z

I simplified the num_workers aspect of the function, thanks for pointing that out!
Do you think of anything else that should be added? For example, is trying to use all the threads as the standard option fine, or should that be changed?

Also tests are failing on github CI, but the problem seems to be with the setup and installation of the packages? Did something about the configuration/setup change?

RMeli · 2024-05-28T18:56:02Z

Cycling to restart CI.

RMeli

Sorry again for the delay @Jnelen, and thank you for your patience. I've been swamped both at work and outside of work.

I had another good look at the PR and left quite a bit of comments, but I think we are getting there. I'll play around a bit more with it in the coming days and eventually add more comments are needed.

tests/test_parallel.py

spyrmsd/parallel.py

Jnelen · 2024-05-29T18:54:43Z

Sorry again for the delay @Jnelen, and thank you for your patience. I've been swamped both at work and outside of work.

I had another good look at the PR and left quite a bit of comments, but I think we are getting there. I'll play around a bit more with it in the coming days and eventually add more comments are needed.

Thanks a lot for your review! I'll try to work on and improve it later this week(end).

Jnelen · 2024-05-30T20:05:44Z

Thanks for your thorough review @RMeli! I tried to address most issues. I added an example test for the more realistic test case (with chunksize=2 and a timeout), but I made it skip, since in the current implementation we force the chunksize to be 1 when a timeout is set. However, I think that for debugging/developing purposes this can still be interesting, which is why I added it anyway (and made it skip so it doesn't give an error for now).

RMeli · 2024-05-30T20:15:18Z

However, I think that for debugging/developing purposes this can still be interesting, which is why I added it anyway (and made it skip so it doesn't give an error for now).

I agree on the interest, but I think it would be better to leave it as a comment here in this PR instead of within the codebase.

As I mentioned in the last comment, I completely forgot we went for exclusivity of the two features, so some of my comments were a bit confusing/confused. Apologies for the confusion.

RMeli · 2024-05-30T20:15:59Z

Test for the hypothetical case where a timeout on chunks of size 2 makes the whole test fail. The test checks that the whole chunk is set to np.nan.

Currently, specifying a timeout will automatically change the chunk size to 1.

@pytest.mark.skip(
    reason="Chunksize is currently automatically set to 1 when a timeout is set"
)
def test_prmsdwrapper_mixed_timeout(muparfostat, benzene) -> None:
    muparfostat_mol = copy.deepcopy(muparfostat)
    benzene_mol = copy.deepcopy(benzene)

    lst_1 = [
        muparfostat_mol,
        benzene_mol,
        benzene_mol,
        benzene_mol,
        benzene_mol,
        muparfostat_mol,
    ]
    lst_2 = [
        muparfostat_mol,
        benzene_mol,
        benzene_mol,
        benzene_mol,
        benzene_mol,
        muparfostat_mol,
    ]

    # Currently we force the num_workers to be 1 when there is a timeout set, but this could make it easier to debug things in the future
    RMSDlist = prmsdwrapper(lst_1, lst_2, timeout=1e-3, num_workers=1, chunksize=2)

    expectedResult = [np.nan, np.nan, 0, 0, np.nan, np.nan]
    np.testing.assert_allclose(RMSDlist, expectedResult, atol=1e-5)

Jnelen · 2024-05-31T08:25:00Z

I added some clarifications in the code! I saw we still had some conversations open, but they were mostly discussions about things that can improved in the future.

First implementation of rmsd function with timeout and multicore func…

8ecd2c2

…tionality Note: MyPy and Flake8 doesn't pass (yet)

RMeli requested changes Mar 11, 2024

View reviewed changes

Merge branch 'develop' into timeout

4ecdbae

Jnelen and others added 2 commits March 12, 2024 21:38

Merge branch 'develop' into timeout

56a1eae

Apply suggestions from code review

52e1e21

General cleanup and improvements Co-authored-by: Rocco Meli <r.meli@bluemail.ch>

RMeli reviewed Mar 12, 2024

View reviewed changes

spyrmsd/rmsd.py Outdated Show resolved Hide resolved

RMeli reviewed Mar 12, 2024

View reviewed changes

spyrmsd/rmsd.py Outdated Show resolved Hide resolved

Jnelen added 3 commits March 12, 2024 22:06

Fix Flake8 problems

1353fb5

Make rmsd_timeout a private function MyPy still complains when comparing num_workers with os.cpu_count()

Slight improvements to _rmsd_timeout

0fe91a7

Verbose implementation of improved functionality:

56a215b

it now supports mol-mol, mol-list and list-list as input. In the case of mol-list, all calculations are performed separately to take advantage of multiple cores

RMeli requested changes Mar 13, 2024

View reviewed changes

spyrmsd/rmsd.py Outdated Show resolved Hide resolved

spyrmsd/rmsd.py Outdated Show resolved Hide resolved

spyrmsd/rmsd.py Outdated Show resolved Hide resolved

spyrmsd/rmsd.py Outdated Show resolved Hide resolved

spyrmsd/rmsd.py Outdated Show resolved Hide resolved

Transition from using a Queue to a shared Value

8dcc22b

Co-authored-by: Rocco Meli <r.meli@bluemail.ch>

RMeli reviewed Mar 13, 2024

View reviewed changes

spyrmsd/rmsd.py Outdated Show resolved Hide resolved

Finish transition to multiprocessing Values

1fe4f9a

Refactor to prmsdwrapper Some improvements to typing

Suppress MyPy errors at line 494 and 538

7e0c311

RMeli reviewed Mar 14, 2024

View reviewed changes

Add basic tests for prmsdwrapper

713b833

Apply suggestions from code review

9491130

Co-authored-by: Rocco Meli <r.meli@bluemail.ch>

RMeli mentioned this pull request Mar 18, 2024

Add IDs to lambda_max_failure fixture in RMSD tests #115

Closed

RMeli reviewed Apr 10, 2024

View reviewed changes

Adress the a lot of the review comments

54e28a1

Improve prmsdwrapper feedback

add chunksize check when timeout is set with prmsdwrapper

ead245b

add shorter warning when not all compounds were successfully processed

Add more tests for prmsdwrapper

cef2b0d

Check if proper warnings are raised (on some compound failures as well as setting chunksize and timeout) Check if setting different chunksize works in test scenario (with and without timeout)

RMeli reviewed May 12, 2024

View reviewed changes

spyrmsd/parallel.py Show resolved Hide resolved

spyrmsd/parallel.py Outdated Show resolved Hide resolved

Simplify num_worker determination

1a0527b

Make warning message more comprehensive

RMeli closed this May 28, 2024

RMeli reopened this May 28, 2024

Merge branch 'develop' into timeout

c0b0fa0

RMeli requested changes May 28, 2024

View reviewed changes

Jnelen added 2 commits May 30, 2024 21:57

Incorporate suggested changes from review

20d1965

Merge branch 'timeout' of https://github.com/Jnelen/spyrmsd into timeout

bca1889

Improve documentation regarding chunksize enforcement

5729f6b

RMSD with multicore and timeout functionality #113

Are you sure you want to change the base?

RMSD with multicore and timeout functionality #113

Conversation

Jnelen commented Mar 11, 2024

codecov bot commented Mar 11, 2024 • edited

Codecov Report

RMeli left a comment

Choose a reason for hiding this comment

RMeli commented Mar 11, 2024

RMeli left a comment

Choose a reason for hiding this comment

Jnelen commented Mar 13, 2024

Jnelen commented Mar 14, 2024

RMeli commented Mar 14, 2024

RMeli Mar 14, 2024

Choose a reason for hiding this comment

Jnelen Mar 14, 2024 • edited

Choose a reason for hiding this comment

Jnelen commented Mar 17, 2024

RMeli commented Mar 18, 2024 • edited

Jnelen commented Mar 18, 2024

Jnelen commented Mar 28, 2024

Jnelen commented Mar 30, 2024

Jnelen commented Mar 30, 2024

Jnelen commented Apr 10, 2024

RMeli left a comment

Choose a reason for hiding this comment

RMeli Apr 10, 2024

Choose a reason for hiding this comment

Jnelen Apr 11, 2024

Choose a reason for hiding this comment

RMeli Apr 12, 2024

Choose a reason for hiding this comment

RMeli Apr 12, 2024

Choose a reason for hiding this comment

Jnelen Apr 13, 2024

Choose a reason for hiding this comment

RMeli Apr 13, 2024

Choose a reason for hiding this comment

Jnelen commented Apr 11, 2024

Jnelen commented Apr 14, 2024

RMeli commented Apr 15, 2024

Jnelen commented Apr 15, 2024

RMeli commented Apr 15, 2024

Jnelen commented Apr 22, 2024

RMeli commented Apr 29, 2024

Jnelen commented Apr 29, 2024

RMeli left a comment

Choose a reason for hiding this comment

Jnelen commented May 13, 2024

Jnelen commented May 16, 2024

RMeli commented May 28, 2024

RMeli left a comment

Choose a reason for hiding this comment

Jnelen commented May 29, 2024

Jnelen commented May 30, 2024

RMeli commented May 30, 2024

RMeli commented May 30, 2024 • edited

Jnelen commented May 31, 2024 • edited

codecov bot commented Mar 11, 2024 •

edited

Jnelen Mar 14, 2024 •

edited

RMeli commented Mar 18, 2024 •

edited

RMeli commented May 30, 2024 •

edited

Jnelen commented May 31, 2024 •

edited