Create a native (modern) C++ implementation of prima (to also use in R) #136

samuel-watson · 2024-01-09T18:06:36Z

There is interest in developing a native C++ implementation of the algorithms in prima. While there are C bindings for the Fortran library, being able to incorporate the algorithms into C++ projects natively perhaps through a header only library would simplify compilation in some settings and possibly provide some efficiency gains. One such use case is to provide access to the algorithms in R and in C++ projects in R (through Rcpp). Fortran functions can be used directly in R but they are a little inefficient as they require copies of all the arguments, whereas C++ functions can operate through pointers, and building a package that passes the strict CRAN standards that requires building the Fortran library and then linking through C is very difficult.

In terms of process, there are Fortran to C++ automatic translators, such as fable. The output can then be manually edited. I do not know what standard of C++ it targets, however, I would be keen to aim to make use of tools in the standard library C++17 or even C++2x, particularly the tools to support memory safety. However, support for C++2x is limited in some settings (for example, it is not possible to use C++20 in R with the Eigen library - a popular linear algebra library - because CRAN rules have significantly delayed updating the R version of Eigen).

So my proposal is to create a header-only native C++ implementation of prima targeting C++17 (at least initially). I also propose adding a pre-compiler flag to allow for a R build version if needed through Rcpp and then to incorporate it into an R package to submit to CRAN. I will add some suggestions below about the API: to be maximally useful the main class will ideally need to handle binding free functions, member functions, and const member functions for minimisation while having no overhead for function calls.

zaikunzhang · 2024-01-09T19:08:34Z

Hi @samuel-watson ,

Thank you very much for raising this point.

A native, modern, and high-quality C++ implementation of PRIMA is of extreme importance to us. Such an implementation may replace the modern Fortran implementation as the code that actually does the computation when people use PRIMA. This will substantially improve the availability and usability of PRIMA. The benefit cannot be overestimated.

Many thanks.

Best regards,
Zaikun

zaikunzhang · 2024-01-09T19:23:12Z

My knowledge of C++ is limited and outdated. So I cannot do much coding for this implementation and will have to rely on the community, but I will be more than happy to assist in other ways.

Before we start, I would like to draw your attention to the following.

PRIMA is designed to solve derivative-free optimization problems. In such problems, the objective/constraint functions are black boxes defined by expensive simulations. Each function evaluation can take seconds, minutes, hours, days, or even months. Consequently, the cost incurred inside the solver does not matter. See here for more elaboration. Keeping this in mind, given an algorithm, when we need to choose between efficiency (in terms of flops, memory, etc) and simplicity&clarity of the code, we should always choose simplicity&clarity. (Note that the algorithm is given, so different implementations should consume the same number of function evaluations, which dominate the cost).
Something I wrote when I started to work on PRIMA:
https://fortran-lang.discourse.group/t/fortran-an-ideal-language-for-writing-templates-of-numerical-algorithms/2054
Coding as a way of documenting human knowledge:
https://everythingfunctional.wordpress.com/2021/01/19/communicating-with-whom/
Writing slow code:
https://fortran-lang.discourse.group/t/writing-slower-go-programs-bitfield-consulting/5733
Testing and verification play a central role in the development of PRIMA. We have to think about how to systematically verify the correctness of the C++ implementation. See here for more elaboration.
I avoided using derived types (i.e., structures) in the Fortran implementation. This is because they will become an obstacle when interfacing Fortran with other languages. I do not think this will be the case for the C++ implementation.
See here for the comments I made on the initiative of @nbelakovski to develop a native Python implementation: Creating a Python reference implementation #34
I have written some notes on translating PRIMA from Fortran to other languages.

Many thanks,
Zaikun

samuel-watson · 2024-01-10T16:40:24Z

Hi @zaikunzhang thanks for the detailed information. Just a couple of responses
0. I think this is broadly correct; however, reducing overhead does have some real world benefit in some scenarios. My interest comes from statistical model fitting and with the old BOBYQA implementation I can end up calling the function thousands of times in a single fit for complex models. So the cumulative effect of avoidable overheads can add up. However, I think the guiding principle that the API simple and clear like you say is key - I think both are achievable here.
4. Does there exist a good test suite of problems we can translate as well? We can also add some C++ specific unit tests.
7. These are really helpful. C++ doesn't really have matrices natively (although this might change for C++23/26) so row/column major ordering is often down to how the user implements it. That being said, it probably makes sense to use a well tested and widely used linear algebra library to avoid any pitfalls in trying to implement matrices using standard containers, and to ensure the best algorithms are used for linear algebra operations. With that in mind, Eigen is probably the industry standard and in my experience has been excellent to use both in terms of ease of use, speed/efficiency, and error catching.

zaikunzhang · 2024-01-21T19:56:41Z

I think this is broadly correct; however, reducing overhead does have some real world benefit in some scenarios. My interest comes from statistical model fitting and with the old BOBYQA implementation I can end up calling the function thousands of times in a single fit for complex models. So the cumulative effect of avoidable overheads can add up. However, I think the guiding principle that the API simple and clear like you say is key - I think both are achievable here.

Right. It is an optimization problem:

minimize overhead subject to code clarity and simplicity.

In this problem, the constraints (code clarity and simplicity) should be satisfied before we minimize the overhead. However, this does not mean that we do not care about overhead.

Does there exist a good test suite of problems we can translate as well? We can also add some C++ specific unit tests.

CUTEst is the standard test set. However, I am not sure how it works with C++.

These are really helpful. C++ doesn't really have matrices natively (although this might change for C++23/26) so row/column major ordering is often down to how the user implements it. That being said, it probably makes sense to use a well tested and widely used linear algebra library to avoid any pitfalls in trying to implement matrices using standard containers, and to ensure the best algorithms are used for linear algebra operations. With that in mind, Eigen is probably the industry standard and in my experience has been excellent to use both in terms of ease of use, speed/efficiency, and error catching.

I agree that Eigen is a good choice, unless there is intrinsic support for array operations in the language.

Thank you.

TiborGY · 2024-01-26T19:15:50Z

A temporary solution could be a C++ API to the current Fortran implementation. If NLOPT could use the implementations within PRIMA, that would immediately make it much easier to use PRIMA from C++ programs, and it would be a drop-in-replacement for any programs currently using these algorithms though NLOPT.

Furthermore, the improved local optimizer implementations would immediately also improve some of the global optimization algorithms in NLOPT
For example I am currently using NLOPT to run the MLSL global optimizer, which requires the specification of a local optimizer. Being able to use the improved version of BOBYQA in NLOPT would likely help MLSL locate good solutions faster.

zaikunzhang · 2024-01-26T20:34:07Z

A C++ API would be wonderful as well.

Given that we already have a C API, how difficult is it to get things to work in C++? It has been a long time since I used C/C++ last time, so excuse me for my ignorance.

TiborGY · 2024-01-27T12:39:24Z

A C++ API would be wonderful as well.

Given that we already have a C API, how difficult is it to get things to work in C++? It has been a long time since I used C/C++ last time, so excuse me for my ignorance.

I do not see any actual API documentation there. Examples are nice, but are not a substitute for API docs. IMO there should be a list of functions which are intended for outside use, some brief text for each of them about what they do, and some sort of API stability declaration, eg. "We will not remove these functions without having a deprecation notice in at least one release". It does not have to be worded like that of course, but it should be documented what the "contact surface" is and not change that without notice.
Doxygen is a classic tool for generating documentation from C/C++ source files.

I have not actually tried using PRIMA yet, as I mostly need global optimization, hence my use of NLOPT's MLSL, which AFAIK can only use one of NLOPT's local optimizers as the local optimizer. So at the moment I cannot really derive a benefit from PRIMA.
But in general, calling C functions from C++ is very easy.

zaikunzhang · 2024-01-27T12:50:29Z

Sure, there are many many documentions missing, not only for the C API. I hope I had enough time to work things out, in addition to my job, but I don't. For the moment I can only work on the Fortran backend and the MATLAB interface, while relying on community efforts for other parts.

samuel-watson · 2024-01-27T13:33:50Z

You can use the C interface in a C++ program as is, but the lack of an API documentation is right. However, my interest in a purely C++ version is to simplify compilation and facilitate incorporation of these really useful algorithms in more projects- I've given the examples of facilitating the use of these algorithms in programs for R as they could be added as a package for R as headers, even if only as an independent project. I would suggest API documentation should be raised as a separate issue.

zaikunzhang added the c++ Issues related to C++ label Jan 9, 2024

zaikunzhang mentioned this issue Jan 9, 2024

Add R binding #104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a native (modern) C++ implementation of prima (to also use in R) #136

Create a native (modern) C++ implementation of prima (to also use in R) #136

samuel-watson commented Jan 9, 2024

zaikunzhang commented Jan 9, 2024

zaikunzhang commented Jan 9, 2024 •

edited

samuel-watson commented Jan 10, 2024

zaikunzhang commented Jan 21, 2024

TiborGY commented Jan 26, 2024

zaikunzhang commented Jan 26, 2024

TiborGY commented Jan 27, 2024

zaikunzhang commented Jan 27, 2024 •

edited

samuel-watson commented Jan 27, 2024

Create a native (modern) C++ implementation of prima (to also use in R) #136

Create a native (modern) C++ implementation of prima (to also use in R) #136

Comments

samuel-watson commented Jan 9, 2024

zaikunzhang commented Jan 9, 2024

zaikunzhang commented Jan 9, 2024 • edited

samuel-watson commented Jan 10, 2024

zaikunzhang commented Jan 21, 2024

TiborGY commented Jan 26, 2024

zaikunzhang commented Jan 26, 2024

TiborGY commented Jan 27, 2024

zaikunzhang commented Jan 27, 2024 • edited

samuel-watson commented Jan 27, 2024

zaikunzhang commented Jan 9, 2024 •

edited

zaikunzhang commented Jan 27, 2024 •

edited