Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we allow C++? #988

Open
thijssteel opened this issue Feb 23, 2024 · 5 comments
Open

Should we allow C++? #988

thijssteel opened this issue Feb 23, 2024 · 5 comments

Comments

@thijssteel
Copy link
Collaborator

I have been working on a pet project. What is would like to do is introduce some C++ code in LAPACK, with a matrix class to represent the matrices, not just a pointer. This is not meant as a way for people to have a nice API to call LAPACK (there are plenty of libraries that provide nice wrappers), but to make things easier for developers in the future. It would allow for comprehensive bounds checks that even work for subarray (via asserts of course, we should disable those in a production build).

The way I want to achieve this is by mixing C++ and Fortran, so that we don't have to suffer through the immense task it would be to translate all of LAPACK at once. We could just stick to C++ for new algorithms or for routines that need to be reworked/debugged (for example, dgesdd, see #672)

A proof of concept implementation is available at https://github.com/thijssteel/lapackv4

I appreciate all feedback.

@martin-frbg
Copy link
Collaborator

I suspect this will create portability problems for some - finding a Fortran compiler for some platforms is hard enough, but having both Fortran and C++ as dependencies in a project just from the reference implementation of a standard API may be a bit much (and by the time everything got rewritten, users may have moved on to newer APIs that drop the Fortran naming constraints)

@thijssteel
Copy link
Collaborator Author

Thanks for the feedback

finding a Fortran compiler for some platforms is hard enough

While this is true, I kind of assumed that most platforms that have a Fortran compiler also have a C++ compiler, maybe not a cutting edge one with C++20 support, but we could restrict which features we use. Do you have examples? I assume you know more about those issues than me from your work on OpenBLAS.

@ilayn
Copy link
Contributor

ilayn commented Feb 23, 2024

I would never thought I would see this day 😃 and I am very confused about this. I do feel that indeed getting out of F77 is an absolute win. However going into C++ scares me and smells like yet another trap for 10-years-later ourselves regretting it.
Newer Fortran90 is there but I think you folks already considered that option anyways and not sure if that solves the issue you already mentioned in the issue so I'll skip that.

Definitely not want to go into a language flamewar at all but C++ does not feel like an archival language like Fortran or C is and it is getting larger and weirder with each standard update. However there are many many industry grade C++ code hence I would refrain from dismissing it. Having said that, in the same breath, I would also agree that C++ is not an array friendly number crunching language.

There is even a new addition accepted for C++23 std::linalg https://www.youtube.com/watch?v=-UXHMlAMXNk I wholeheartedly welcome these additions and have huge respect for people working thanklessly on these topics. However if that work hits C++ and this work starts getting in, it is going to be a very strange situation. We would have mdspan, std::vector, eigen, reference-lapack and nothing would work together because... C++. So probably better using the newest C++23 to benefit those stuff but then @martin-frbg is right about lack of compilers. If we don't support new stuff then we automatically have a brand-new legacy code. It is indeed a very tricky situation.

Over SciPy, for many reasons, we started to write all F77 codebase out of F77 scipy/scipy#18566 and we are covering quite some distance. We started to use C just to have a very neutral codebase (with all error warning flags enabled and carefully not using any opinionated parts of C) in case we jump to another language should it arise in the near future. In fact, I am gathering courage to write BLAS in C/Rust myself as a pet project once SciPy work finalizes. So please let me know, if you would need some extra hand. The times seem somehow ripe for it.

I know Rust is not ready for everything but these new languages actually offer quite robust codebase with very little room for strange memory issues we always have with lwork et al. Not proposing that we jump on the band-wagon but having native support for threading, SIMD, GPU and other goodies seems to me worth shooting for though. The critical issue is, I can imagine, the actual experts not knowing enough Rust. There are also multiple native attempts in Rust by different folks that really do feel like the language itself is already competitive without much too low-level wizardry for example https://github.com/sarah-ek/faer-rs

Anyways, thank you even for considering this option though.

@christoph-conrads
Copy link
Contributor

christoph-conrads commented Feb 24, 2024

The major problems with Fortran are the massively error prone code duplication (real/complex, single/double), error prone variable initialization, and the lack of conditional compilation including assertions and initializing memory in debug mode with, e.g., NaN. Even better to this day, gfortran allows me to compile code without warnings that reads undeclared and uninitialized variables even when -Wextra -Wall is passed on the command line.

I support C++ as a replacement because it fixes all of the mentioned problems (see for example my C++ header-only library generating uniformly distributed floating-point numbers for which there exists only one generic implementation). C++11 and C++14 compilers are widely available, also on the supercomputers that I had access to. A major challenge with C++ is the number of language features (virtual inheritance, concepts, variadic template arguments, undefined vs unspecified behavior...) and the complex semantics. In this regard I suggest to limit the use to a subset of a modern C++ ("modern" meaning C++11 or newer).

I suspect this will create portability problems for some - finding a Fortran compiler for some platforms is hard enough, but having both Fortran and C++ as dependencies in a project just from the reference implementation of a standard API may be a bit much (and by the time everything got rewritten, users may have moved on to newer APIs that drop the Fortran naming constraints)

In my experience finding a Fortran compiler is the hard part, especially when building for Android. LLVM became the default toolchain in Android native development kit (NDK) release 13b in 2016 (see the NDK release notes); GCC was finally removed in r18b (2018). The f18 Fortran compiler became part of LLVM only in 2020, see Flang and F18.

Finding a feature-complete(!) C++20/C++23 compiler is harder though, cf. Cppreference: C++ Compiler Support.

One question comes to mind though: What is the difference between LAPACK with C++ and the tlapack project then?

@thijssteel
Copy link
Collaborator Author

As one of the main authors, I know TLAPACK quite well. While I love the project, I don't think the direction where it is currently headed is suitable for a replacement for reference-lapack. The value of TLAPACK lies more in being able to use completely different layouts relatively seamlessly (not row-column major, but more like tiled, block cyclic, distributed through starpu,....). The same templated nature that makes it powerful also means that codes need to work for both owning and non-owning matrix classes, half precision, ...

I also realized, limiting ourselves to c++14 means no if constexpr from c++17. That has already proven to be essential in TLAPACK for dealing with real and complex code. Maybe we can still use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants