Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationale behind some of the bits and pieces in the C code #60

Open
LTLA opened this issue Jun 29, 2021 · 4 comments
Open

Rationale behind some of the bits and pieces in the C code #60

LTLA opened this issue Jun 29, 2021 · 4 comments

Comments

@LTLA
Copy link
Contributor

LTLA commented Jun 29, 2021

I've been writing a C++ port of irlba's C code (https://github.com/LTLA/CppIrlba) for use in some other applications of mine. It's been... tough going, but I think I've gotten it right, especially after cross-referencing to the Baglama and Reichel paper. Not that I really understand what's going on, but at least I know where the various steps are coming from.

That said, there's a few steps that I can't find in the paper - perhaps you can shed some light on why they exist:

irlba/src/irlb.c

Lines 412 to 415 in 2ad759c

/* One step of classical Gram-Schmidt */
R = -R_F;
F77_NAME (daxpy) (&m, &R, W + j * m, &inc, W + (j + 1) * m,
&inc);

Is this necessary? I understand that Gram-Schmidt orthogonalizes the (j+1)-th column against the j-th column, but it's followed by an full reorthogonalization via orthog() that does the same thing, so what's the rationale for this extra step?

FOO = RNORM (n);

I'm guessing that the various RNORM() calls are intended to just fill W and V with some kind of content when the norm is zero, otherwise we'd end up with some nonsense (indefinite values or all-zero values). I'd further guess that the arbitrariness of the values don't really matter here because the associated singular values will be zero after the SVD in the main IRLBA loop.

I probably have a few other questions that will come to me as I stare at the code and paper for a while longer. Of course, any of your thoughts on the current state of the port are also appreciated.

@bwlewis
Copy link
Owner

bwlewis commented Jun 30, 2021 via email

@bwlewis
Copy link
Owner

bwlewis commented Jun 30, 2021 via email

@LTLA
Copy link
Contributor Author

LTLA commented Jul 1, 2021

answers back to front...

p.s. would you consider being also a co-maintainer of the original irlba package since I am so negligent in updates sometimes? You're into this code quite a bit and have made many valuable contributions already...

I'm flattered but I don't think I have enough understanding of the theory to (co-)maintain it. I already knew my linear algebra was weak, but gee, attempting to read the paper really drove it home. I suspect that there is a decade of study between where I am now and the point at which I could confidently make improvements/fixes to the code. (For example, there were many lines in irlb.c that I thought, "this can't possibly be right, I'll complain to Bryan about this"... but it was, so I guess that's me told.)

That restart stuff remains to be re-implemented someday, so be aware of that.

Sure. I recently threw in some GHA tests to compare my implementation against the R package, which should allow me to keep track of changes on your end. (Assuming my test scenarios are comprehensive enough to trigger any changes in the restarts.)

I have been completely side-tracked for like a year by some health problems though.

Ah. Hope all is well.

Separately I will put a completely separate C code library version that I have up somewhere (similar to your C++ library).

I wonder if there is a way for us to combine our implementations so we can share the maintenance burden for the core logic. The code should compile in both C and C++; the problem is more about the dependencies during build. I'm eventually compiling to a system without BLAS or LAPACK, hence the use of Eigen (which makes compilations v-e-r-y slow)...

I'd have to think about that for a bit.

See these old papre review notes for more if you're interested:

I'm afraid that's several levels too deep for me, though I do appreciate the description of the distinction from rsvd(), having worked on C++ ports of both algorithms.

On a related note, if you were to ever write an "IRLBA for dummies", I would definitely read it.

I recommend just always re-orthogonalizing the basis in practice

Will do. So if I'm always doing the full reorthogonalization... is there a need for the Gram-Schmidt step at all? I did some empirical testing where its removal didn't change the result IIRC, but I wasn't sure whether there was a deeper purpose to that step.

@bwlewis
Copy link
Owner

bwlewis commented Jul 4, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants