eigen and linalg

Some facts

linalg api has saturated and doesn't seem to change a lot these days. This means we mostly need the types of calls that are in there: a number of dot products, linear solves, factorizations
Compile time and memory of linalg modules are now a problem
Maintenance is a problem as the namespace/macro files are very big (changes are costly in labor)
GPU features of linalg are not used in algorithms
eigen3 is the only backend
autodiff won't work with the current system
due to the lack of expression trees at compile or runtime, no heterogeneous computing devices can be exploited

simplicity, less code, outsourcing
linalg has many issues even though although we tried to write something that lasts. Maybe we shouldn't do this ourselves? This in particular includes GPU/CPU/mix stuff
eigen3 is stable here to stay, e.g. as it is a dependency for tensorflow, back then eigen3 was way more niche than now
slow and memory intensive compilation of linalg, which can make some computers crash
eigen's dsl is more clean than our linalg API, which is quite cumbersome as not oop, epecially when doing non-trivial dot product chaining (need to pass transpose flags as function arguments)
A solution for GPU/CPU/etc will probably be built on top of eigen (by somebody else), see e.g. sycl
We like compilers, shogun modules are usually fixed, so why not do compile time optimized linear algebra
Eigen has minimal autodiff support built in (for scalars), see Gil's patch. We could probably extend this to vector valued expressions
- Even though it is unsupported we could write our own version of AutoDiffScalar and add more functionality that exists in Stan
- the problem with StanMath is that it's very slow when resusing the same gradient calculation, like we do here. On the other hand Stan supports a lot of reverse mode AD functionality.

Expression trees built at runtime that are JIT'ed (tensorflow XLA jit style)
- allows for easy autodiff
- should be as fast as compiled (potentially faster)
- writing this ourselves is nuts, do frameworks for this exist? (something like this)
- need to refactor all algos
Expression trees are built at compile time and we rely on compiler to optimize/distribute
- should allow for autodiff (that's what eigen's autodiff does)
- still fast
- fits shogun more as our models are fixed at compile time -> less heavy refactoring if any
- easier to implement and integrates well with the Eigen lazy evaluation pattern