Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a native (modern) C++ implementation of prima (to also use in R) #136

Open
samuel-watson opened this issue Jan 9, 2024 · 9 comments
Labels
c++ Issues related to C++

Comments

@samuel-watson
Copy link

There is interest in developing a native C++ implementation of the algorithms in prima. While there are C bindings for the Fortran library, being able to incorporate the algorithms into C++ projects natively perhaps through a header only library would simplify compilation in some settings and possibly provide some efficiency gains. One such use case is to provide access to the algorithms in R and in C++ projects in R (through Rcpp). Fortran functions can be used directly in R but they are a little inefficient as they require copies of all the arguments, whereas C++ functions can operate through pointers, and building a package that passes the strict CRAN standards that requires building the Fortran library and then linking through C is very difficult.

In terms of process, there are Fortran to C++ automatic translators, such as fable. The output can then be manually edited. I do not know what standard of C++ it targets, however, I would be keen to aim to make use of tools in the standard library C++17 or even C++2x, particularly the tools to support memory safety. However, support for C++2x is limited in some settings (for example, it is not possible to use C++20 in R with the Eigen library - a popular linear algebra library - because CRAN rules have significantly delayed updating the R version of Eigen).

So my proposal is to create a header-only native C++ implementation of prima targeting C++17 (at least initially). I also propose adding a pre-compiler flag to allow for a R build version if needed through Rcpp and then to incorporate it into an R package to submit to CRAN. I will add some suggestions below about the API: to be maximally useful the main class will ideally need to handle binding free functions, member functions, and const member functions for minimisation while having no overhead for function calls.

@zaikunzhang
Copy link
Member

Hi @samuel-watson ,

Thank you very much for raising this point.

A native, modern, and high-quality C++ implementation of PRIMA is of extreme importance to us. Such an implementation may replace the modern Fortran implementation as the code that actually does the computation when people use PRIMA. This will substantially improve the availability and usability of PRIMA. The benefit cannot be overestimated.

Many thanks.

Best regards,
Zaikun

@zaikunzhang zaikunzhang added the c++ Issues related to C++ label Jan 9, 2024
@zaikunzhang
Copy link
Member

zaikunzhang commented Jan 9, 2024

My knowledge of C++ is limited and outdated. So I cannot do much coding for this implementation and will have to rely on the community, but I will be more than happy to assist in other ways.

Before we start, I would like to draw your attention to the following.

  1. PRIMA is designed to solve derivative-free optimization problems. In such problems, the objective/constraint functions are black boxes defined by expensive simulations. Each function evaluation can take seconds, minutes, hours, days, or even months. Consequently, the cost incurred inside the solver does not matter. See here for more elaboration. Keeping this in mind, given an algorithm, when we need to choose between efficiency (in terms of flops, memory, etc) and simplicity&clarity of the code, we should always choose simplicity&clarity. (Note that the algorithm is given, so different implementations should consume the same number of function evaluations, which dominate the cost).

  2. Something I wrote when I started to work on PRIMA:
    https://fortran-lang.discourse.group/t/fortran-an-ideal-language-for-writing-templates-of-numerical-algorithms/2054

  3. Coding as a way of documenting human knowledge:
    https://everythingfunctional.wordpress.com/2021/01/19/communicating-with-whom/

  4. Writing slow code:
    https://fortran-lang.discourse.group/t/writing-slower-go-programs-bitfield-consulting/5733

  5. Testing and verification play a central role in the development of PRIMA. We have to think about how to systematically verify the correctness of the C++ implementation. See here for more elaboration.

  6. I avoided using derived types (i.e., structures) in the Fortran implementation. This is because they will become an obstacle when interfacing Fortran with other languages. I do not think this will be the case for the C++ implementation.

  7. See here for the comments I made on the initiative of @nbelakovski to develop a native Python implementation: Creating a Python reference implementation #34

  8. I have written some notes on translating PRIMA from Fortran to other languages.

Many thanks,
Zaikun

@samuel-watson
Copy link
Author

Hi @zaikunzhang thanks for the detailed information. Just a couple of responses
0. I think this is broadly correct; however, reducing overhead does have some real world benefit in some scenarios. My interest comes from statistical model fitting and with the old BOBYQA implementation I can end up calling the function thousands of times in a single fit for complex models. So the cumulative effect of avoidable overheads can add up. However, I think the guiding principle that the API simple and clear like you say is key - I think both are achievable here.
4. Does there exist a good test suite of problems we can translate as well? We can also add some C++ specific unit tests.
7. These are really helpful. C++ doesn't really have matrices natively (although this might change for C++23/26) so row/column major ordering is often down to how the user implements it. That being said, it probably makes sense to use a well tested and widely used linear algebra library to avoid any pitfalls in trying to implement matrices using standard containers, and to ensure the best algorithms are used for linear algebra operations. With that in mind, Eigen is probably the industry standard and in my experience has been excellent to use both in terms of ease of use, speed/efficiency, and error catching.

@zaikunzhang
Copy link
Member

  1. I think this is broadly correct; however, reducing overhead does have some real world benefit in some scenarios. My interest comes from statistical model fitting and with the old BOBYQA implementation I can end up calling the function thousands of times in a single fit for complex models. So the cumulative effect of avoidable overheads can add up. However, I think the guiding principle that the API simple and clear like you say is key - I think both are achievable here.

Right. It is an optimization problem:

minimize overhead subject to code clarity and simplicity.

In this problem, the constraints (code clarity and simplicity) should be satisfied before we minimize the overhead. However, this does not mean that we do not care about overhead.

  1. Does there exist a good test suite of problems we can translate as well? We can also add some C++ specific unit tests.

CUTEst is the standard test set. However, I am not sure how it works with C++.

  1. These are really helpful. C++ doesn't really have matrices natively (although this might change for C++23/26) so row/column major ordering is often down to how the user implements it. That being said, it probably makes sense to use a well tested and widely used linear algebra library to avoid any pitfalls in trying to implement matrices using standard containers, and to ensure the best algorithms are used for linear algebra operations. With that in mind, Eigen is probably the industry standard and in my experience has been excellent to use both in terms of ease of use, speed/efficiency, and error catching.

I agree that Eigen is a good choice, unless there is intrinsic support for array operations in the language.

Thank you.

@TiborGY
Copy link

TiborGY commented Jan 26, 2024

A temporary solution could be a C++ API to the current Fortran implementation. If NLOPT could use the implementations within PRIMA, that would immediately make it much easier to use PRIMA from C++ programs, and it would be a drop-in-replacement for any programs currently using these algorithms though NLOPT.

Furthermore, the improved local optimizer implementations would immediately also improve some of the global optimization algorithms in NLOPT
For example I am currently using NLOPT to run the MLSL global optimizer, which requires the specification of a local optimizer. Being able to use the improved version of BOBYQA in NLOPT would likely help MLSL locate good solutions faster.

@zaikunzhang
Copy link
Member

A C++ API would be wonderful as well.

Given that we already have a C API, how difficult is it to get things to work in C++? It has been a long time since I used C/C++ last time, so excuse me for my ignorance.

@TiborGY
Copy link

TiborGY commented Jan 27, 2024

A C++ API would be wonderful as well.

Given that we already have a C API, how difficult is it to get things to work in C++? It has been a long time since I used C/C++ last time, so excuse me for my ignorance.

I do not see any actual API documentation there. Examples are nice, but are not a substitute for API docs. IMO there should be a list of functions which are intended for outside use, some brief text for each of them about what they do, and some sort of API stability declaration, eg. "We will not remove these functions without having a deprecation notice in at least one release". It does not have to be worded like that of course, but it should be documented what the "contact surface" is and not change that without notice.
Doxygen is a classic tool for generating documentation from C/C++ source files.

I have not actually tried using PRIMA yet, as I mostly need global optimization, hence my use of NLOPT's MLSL, which AFAIK can only use one of NLOPT's local optimizers as the local optimizer. So at the moment I cannot really derive a benefit from PRIMA.
But in general, calling C functions from C++ is very easy.

@zaikunzhang
Copy link
Member

zaikunzhang commented Jan 27, 2024

Sure, there are many many documentions missing, not only for the C API. I hope I had enough time to work things out, in addition to my job, but I don't. For the moment I can only work on the Fortran backend and the MATLAB interface, while relying on community efforts for other parts.

@samuel-watson
Copy link
Author

You can use the C interface in a C++ program as is, but the lack of an API documentation is right. However, my interest in a purely C++ version is to simplify compilation and facilitate incorporation of these really useful algorithms in more projects- I've given the examples of facilitating the use of these algorithms in programs for R as they could be added as a package for R as headers, even if only as an independent project. I would suggest API documentation should be raised as a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues related to C++
Projects
None yet
Development

No branches or pull requests

3 participants