-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistency in sign. #24
Comments
Thanks, this is a good suggestion. I will work in a solution to this, probably using the sign of the 1st element of the first u vector. |
Hey, I'm sorry to keep harping about this issue, but I found another source of inconsistency. The same data represented in slightly different way returns different signs even after setting the seed. Here's a reproducible example. sign_test <- function(x, method) {
# Synthetic data representing an scalar field in the
# southern hemisphere in a "tidy" format
data <- expand.grid(lon = seq(0, 360, by = 2),
lat = seq(-90, 0, by = 2),
date = 1:10)
set.seed(x)
data$z <- rnorm(nrow(data))
# The same data represented as a matrix differing only
# in the order of the columns
matrix1 <- as.matrix(reshape2::dcast(data, date ~ lon + lat, value.var = "z"))
matrix2 <- as.matrix(reshape2::dcast(data, date ~ lat + lon, value.var = "z"))
# Base svd has no problem
if (method == "base") {
r1 <- base::svd(matrix1, nv = 1, nu = 1)$v
r2 <- base::svd(matrix2, nv = 1, nu = 1)$v
} else {
set.seed(42)
r1 <- irlba::irlba(matrix1, nu = 1, nv = 1)$v
set.seed(42)
r2 <- irlba::irlba(matrix2, nu = 1, nv = 1)$v
}
sign(r1[1]) == sign(r2[1])
}
N <- 20
sapply(1:N, sign_test, method = "base") # all TRUE! :)
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [15] TRUE TRUE TRUE TRUE TRUE TRUE
sapply(1:N, sign_test, method = "irlba") # some FALSE! :(
#> [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
#> [12] FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE |
Thanks, this is in fact a different, more serious, problem than the sign issue above-- |
Sorry, this was not a bug, but it does show another example of the inconsistent sign issue that you already surmised abouve. You'll be happy that a partial fix for the inconsistent sign issue is ready, with a full version to follow soon (more on that below). But first, some notes on your example above. I had to think a bit about it. I don't think that the example illustrates things as clearly as it could, hence my initial confusion, because Here is a restatement of your example that builds a matrix with many more columns than rows using an explicitly known SVD. We then compute the SVD using R's version and irlba. It's really the same as your example, without the lat/long stuff and just focusing in on the specific issue...
So at this point,
However, at least R's native svd function is self-consistent with respect to sign:
This is really the crux of your issue report here in issue #24 -- irlba is unfortunately not even self-consistent :( shown next:
I agree that this is somewhat unsatisfactory, even though technically still correct. After thinking about this a lot, I am working on a surprisingly simple fix, that unfortunately is less simple than it seems and breaks some tests at the moment. I added a (for now, hidden) option to set the internally used RNG in the function. If we choose an RNG with uniform sign, then your problems above are mitigated:
Cazart! So this is a first draft/experimental of a fix for this issue. Unfortunately, it's not yet finished and is breaking a few things here and there. Try it out (it will be in the GitHub repository presently). For now, it's a hidden option with no documentation, etc. I am interested in thoughts on this approach, any help appreciated, including more of your great examples which have helped so far. Best, Bryan |
tests are breaking though, something not quite right yet...(in process)
Sorry, the test failure is due to an unrelated issue that I forgot about. I think this new rng=runif option will fix your issue. If you agree, we can make it an official option with doc. |
I sat down willing to test this and I now I understand your explanation of why my previous code was not correct. I was testing the sign of the first element of v, but that element does not represent the same point in space when the matrix is permuted (I come from the physical sciences, so my background in matrix algebra is rather shallow and I needed the physical dimension to understand things). After running proper tests, I see now that it is not a problem. Silly me 🤦♂️ . If I understand your fix correctly, it fixes the issue in my first comment without the need to specify the seed, is that correct? That's wonderful for reproducibility. While setting the seed is always a good idea, it's not ideal because the result is still dependent on the chosen seed. Is there any drawback of using Works like a charm data("volcano")
sapply(1:10, FUN = function(x) sign(irlba::irlba(volcano, nv = 1, rng = runif)$u[1]))
#> [1] 1 1 1 1 1 1 1 1 1 1 Even across seeds. 🎆 sapply(1:10, FUN = function(x) {
set.seed(x)
sign(irlba::irlba(volcano, nv = 1, rng = runif)$u[1])})
#> [1] 1 1 1 1 1 1 1 1 1 1 The only potential issue is that the sign is not consistent with sign(svd(volcano, nv = 1)$u[1]) == sign(irlba::irlba(volcano, nv = 1, rng = runif)$u[1])
#> [1] FALSE In this case, it's solved by changing the sign of the rng. sign(svd(volcano, nv = 1)$u[1]) == sign(irlba::irlba(volcano, nv = 1, rng = function(x) -runif(x))$u[1])
#> [1] TRUE But it seems to be data-dependent :( sign_test <- function(x) {
# Synthetic data representing an scalar field in the
# southern hemisphere in a "tidy" format
data <- expand.grid(lon = seq(0, 360, by = 2),
lat = seq(-90, 0, by = 2),
date = 1:10)
set.seed(x)
data$z <- rnorm(nrow(data))
matrix1 <- as.matrix(reshape2::dcast(data, date ~ lon + lat, value.var = "z"))
r1 <- base::svd(matrix1, nv = 1, nu = 1)$u
r2 <- irlba::irlba(matrix1, nu = 1, nv = 1, rng = function(x) -runif(x))$u
sign(r1[1]) == sign(r2[1])
}
N <- 20
sapply(1:N, sign_test)
#> [1] TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
#> [12] TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE I don't know if the above inconsistency can be considered a problem or not. It seems to me that different results returned from different algorithms is a natural and unavoidable consequence of the non-uniqueness of SVD. If you think this is not worth changing, then I think this issue is safely closed. Thanks! |
I know that principal value decomposition is defined up to sign, but I think that for reproducibility,
irlba()
should be consistent. Ideally, it should follow the same convention as thebase::svd()
function. The workaround right now seems to set the seed each time irlba is run.Reproducible example:
(42 turns out to be the answer to the ultimate question of life, the universe, everything and consistent singular value decompositions)
The text was updated successfully, but these errors were encountered: