Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cycle 4 Funding: Performance optimisation and C code review #390

Merged
merged 2 commits into from May 2, 2024

Conversation

manodeep
Copy link
Contributor

@manodeep manodeep commented Mar 1, 2024

The file contains all the relevant info :)

@kelle
Copy link
Member

kelle commented Mar 11, 2024

So happy to this request! Just want to confirm, this is to support you as an independent contractor, yes?

@manodeep
Copy link
Contributor Author

Thank you! Yes, this will be for me as an independent contractor (based in Australia, if that matters)

@mhvk
Copy link
Contributor

mhvk commented Mar 12, 2024

Two somewhat random notes:

  1. In implementing string parsing for Time, I noticed that some of the quick C code that was written was actually faster if done in python using numpy routines. Since then, I've been wary of doing too much in C - to get it really fast requires fairly deep knowledge.
  2. More generally, numpy is improving quite steadly, vectorizing more and more functions. In in numpy 2.0, there is also new API to check what features are available.

More generally, I like that you are thinking of actually improving the code itself, rather than papering over slow stuff with caches or interpolation. The trickiest part may indeed be to find out what are the real bottlenecks.

@manodeep
Copy link
Contributor Author

Two somewhat random notes:

1. In implementing string parsing for `Time`, I noticed that some of the quick C code that was written was actually faster if done in python using numpy routines. Since then, I've been wary of doing too much in C - to get it really fast requires fairly deep knowledge.

I am not surprised at all. Beating numpy requires either extra domain-specific implementation or (as you said) really deep knowledge in C. And as someone with said deep knowledge, I do not recommend going down that route unless absolutely essential. In fact, I can see that a useful project might be to evaluate whether existing C code for specific functions is actually faster than equivalent python/numpy calls and replacing C code wherever possible.

2. More generally, numpy is improving quite steadly, vectorizing more and more functions. In  in [numpy 2.0](https://numpy.org/devdocs/release/2.0.0-notes.html), there is also [new API](https://numpy.org/devdocs/reference/generated/numpy.lib.introspect.opt_func_info.html) to check what features are available.

That's neat! Might have to figure out a way to add that into Corrfunc :)

More generally, I like that you are thinking of actually improving the code itself, rather than papering over slow stuff with caches or interpolation. The trickiest part may indeed be to find out what are the real bottlenecks.

Indeed. Putting bandaids to fix a semi-working code does not really work long-term, and I prefer to solve it in a (potentially) harder but more sustainable way. And I totally agree about the bottlenecks - finding out which functions are the real bottlenecks might be tricky - that's where I hope that the voice of community experience will come in handy, at least for a starting group of potential targets.

@manodeep
Copy link
Contributor Author

manodeep commented Apr 4, 2024

While working on multi-threaded race condition in wcslib, I noticed several potential improvements to the wcslib code. The original intent of my PR was not meant for reviewing code within the cextern directory - however, may be this is an area that also needs attention. Specifically, I have noticed the following potential improvements in wcslib:

  • use of confusing assignment and status checking
  • use of 0x0 as a special value for null-pointers instead of using the standard NULL (which is (void *) 0 in most (all?) systems. )
  • use of [0, 1] as return codes instead of the more portable EXIT_SUCCESS and EXIT_FAILURE (from <stdlib.h>)
  • mixed use of string handling functions for safer (up to a max numchars)/unsafe with strncpy/strcpy. All should be replaced with safer strlcpy (which will truncate the source string if necessary but guarantee a terminating \0 in the destination string).
  • use of a static variable for static int WCSSET = 137 in multiple C files which would be better off as a macro within wcs.h

These changes above will improve code readability and make the C code more compliant with conventions, but the current C code in wcslib (that I have reviewed so far) should generate correct output under most/all (non-hostile) circumstances.

Additional context for the assignment + status checking

There are a lot of potentially confusing statements where an assignment is done and status checked in the same line throughout the wcslib C codebase under cextern. These statements would be better off being split into two and replacing 0x0 with NULL, e.g.,

if ((wcs->types = calloc(naxis, sizeof(int))) == 0x0) {

//ideally should be broken into two lines as
wcs->types = calloc(naxis, sizeof(int));
if (wcs->types == NULL) {

However, making such changes within astropy might cause the cextern package to diverge from upstream and potentially cause maintenance issues in the future. If such changes might be useful for the project, and there are funds for tackling them into wcslib, then I am happy to take it on (either as part of this FR or a future one). Assuming the existing tests are exhaustive, I would estimate the changes would take ~40 hours to incorporate.

@eteq
Copy link
Member

eteq commented Apr 5, 2024

Please react to this comment to vote on this proposal (👍, 👎, or no reaction for +0)

@dhomeier
Copy link
Contributor

However, making such changes within astropy might cause the cextern package to diverge from upstream and potentially cause maintenance issues in the future. If such changes might be useful for the project, and there are funds for tackling them into wcslib, then I am happy to take it on (either as part of this FR or a future one). Assuming the existing tests are exhaustive, I would estimate the changes would take ~40 hours to incorporate.

Would be great to see improvements here, but I think we'd definitely not want to get into using a forked version of WCSLIB, so one should try to push any such enhancements upstream – since that is more or less single-handedly maintained by Mark Calabretta, this will probably not follow the standard repository procedure, but I think he is entirely open to useful reports and patches. Probably @mcara can comment more on how this might work out.

@pllim
Copy link
Member

pllim commented Apr 12, 2024

Definitely worth a try but I think it would be much more effective if a long-time astropy maintainer could be a mentor/champion of the resulting PRs (cc @astrofrog). Also I feel like HPC community would also be interested (cc @weaverba137). Thanks!

@pllim
Copy link
Member

pllim commented Apr 12, 2024

From the sidelines, Calabretta seems very responsive and receptive to patches if you can prove to him they are valid patches.

@manodeep
Copy link
Contributor Author

However, making such changes within astropy might cause the cextern package to diverge from upstream and potentially cause maintenance issues in the future. If such changes might be useful for the project, and there are funds for tackling them into wcslib, then I am happy to take it on (either as part of this FR or a future one). Assuming the existing tests are exhaustive, I would estimate the changes would take ~40 hours to incorporate.

Would be great to see improvements here, but I think we'd definitely not want to get into using a forked version of WCSLIB, so one should try to push any such enhancements upstream – since that is more or less single-handedly maintained by Mark Calabretta, this will probably not follow the standard repository procedure, but I think he is entirely open to useful reports and patches. Probably @mcara can comment more on how this might work out.

Yeah that's my concern too. I would prefer changes to wcslib to be incorporated upstream so that it is less of a maintenance burden. Even the second/future part that I have suggested for modernising the code would also be better implemented upstream. Given the downvote on the idea about the changes from @mcara, that probably is a sign that the (second/future) changes should either not be done or at least be done upstream.

@manodeep
Copy link
Contributor Author

From the sidelines, Calabretta seems very responsive and receptive to patches if you can prove to him they are valid patches.

Given the email exchanges so far, Mark does seem fairly responsive and reasonably open to discussion. Let's see what he thinks about my proposed pthreads patch for #16245

@mcara
Copy link

mcara commented Apr 12, 2024

Mark's style in supporting WCSLIB is to do all code modifications himself, which is different than the astropy approach. Nevertheless, he has done a great service supporting this library, and such suggestions with regard to personal style may be offensive to him. Personally, I don't see most of these warranting forking the code or engaging in an effort to persuade him to do things differently. One risk is that he offloads all support to the critics. Is that desired?

As has been mentioned Mark has generally fixed things that he has been shown to be bugs or errors.

@manodeep
Copy link
Contributor Author

Mark's style in supporting WCSLIB is to do all code modifications himself, which is different than the astropy approach. Nevertheless, he has done a great service supporting this library, and such suggestions with regard to personal style may be offensive to him. Personally, I don't see most of these warranting forking the code or engaging in an effort to persuade him to do things differently. One risk is that he offloads all support to the critics. Is that desired?

As has been mentioned Mark has generally fixed things that he has been shown to be bugs or errors.

Forking and maintaining is definitely not desired, especially since (as you noted) the issues do not fundamentally impact correctness. Plus, the goal is to reduce the long-term effort, and not add to it :D

@jdswinbank
Copy link
Contributor

I'm writing on behalf of Astropy's Finance Committee. We are pleased to be able to let you know that, following consultation with the community, we are able to approve this request. We can currently fund the one year of work in the amount of $10,200 (US). Future funding beyond this amount will be contingent upon the availability of funds. (We will be using the full budgets of all of the approved requests to craft future grant and funding proposals.) We assume you will be using this full year 1 budget; if that is not the case, please contact us immediately.

Ana Gabela and I will be your contacts on the Finance Committee to facilitate this award. Please get in touch with us if you have any questions or concerns. Please do not reach out to NumFOCUS directly.

In addition, new to this funding cycle, is the assignment of a COTR (Contracting Officer's Technical Representative) to each funded project. This concept is borrowed from government funding agencies, although it is to be stressed that Astropy's goal is to make the COTR role as low-overhead as possible. The COTR’s primary responsibility is to make sure the work is happening at the expected pace and, if necessary, to be a liaison between the funded project and the Finance Committee or CoCo. The COTR for your project will be assigned shortly and we’ll also be sending out more details about how we see this working.

The next steps are:

  • Please make sure that you are registered with Open Collective: this is the system you will use to submit your invoices for payment. Note that your invoices should be submitted through the Astropy NASA project.
  • We will reach out through NumFOCUS, Astropy's Fiscal Sponsor, to set up an “independent contractor agreement” (ICA; effectively a contract) and formal scope of work for the project.
  • While this is in progress, the Finance Committee and the Coordination Committee will identify a COTR for each project. More information on this will be in the issue for your project once the COTR is identified.

We expect the work to be carried out between 2024-05-01 and 2025-04-30 (the “period of performance”). If you come to realize that the work will take more time than originally planned, you can apply for an extension following the procedure described at https://github.com/astropy/astropy-project/blob/main/finance/process/request-extension.md.

Congratulations — we are really looking forward to seeing you put these funds to good use!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants