Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Is it possible to rebuild the original input data from an rda object? #514

Open
MalteThodberg opened this issue May 11, 2022 · 5 comments

Comments

@MalteThodberg
Copy link

If i have an rda-object from the rda() function, it is possible to get out the orignial input data? Following the example from the rda() documentation:

# Fit RDA
data(dune)
data(dune.env)
dune.Manure <- rda(dune ~ Manure, dune.env)

# Extract fitted and residuals values
r <- residuals(dune.Manure)
f <- fitted(dune.Manure)

Now I would expect that r + f would give the original input dune, but that doesn't seem the be case:

# These are not the same!
head(dune)
head(f + r)

Is there a way a way to re-estimate the original input data from the rda-object?

@jarioksa
Copy link
Contributor

jarioksa commented May 12, 2022

Yes, you can do it, but there is no direct vegan function to do so. Functions fitted and residuals return data structures in the scale of observations so that rda(fitted(rda(dune ~ Manure, dune.env))) returns the same constrained component as rda(dune ~ Manure, dune.env). For additive components you should ask for internal or "working" form instead of the default "response" form, and then manually change the result to the "response" form you are asking. How to do this, depends on the ordination method, and CCA and RDA do this differently. However, you may peak at the function vegan:::fitted.rda to see how this done there. The key lines are at the end:

        Xbar <- Xbar * sqrt(nrow(Xbar) - 1)
        Xbar <- sweep(Xbar, 2, cent, "+")

So you have to undo the scaling of data in RDA (which we do to be consistent with stats::prcomp and stats::princomp and give the eigenvalues in terms of variance instead of raw crossproducts). Key point for your problem was the centring, which in your example is done twice, both for the fitted and residual results, and hence the data are off by double-centring instead of centring once.

The following is the slow way:

library(vegan)
data(dune, dune.env)
m <- rda(dune ~ Manure, dune.env)
f <- fitted(m, type = "working")
r <- residuals(m, type = "working")
fr <- f+r
cnt <- attr(fr, "scaled:center")
fr <- sqrt(nrow(fr) - 1) * fr
range(sweep(fr, 2, cnt, "+") - dune)
# [1] -2.220446e-15  2.664535e-15

Floating point calculations are not numerically exact, but you got within 2.66 × 10-15 of original values.

You can shortcut this by getting directly the sum of working matrices:

fr <- ordiYbar(m, "initial")
sweep(sqrt(nrow(fr)-1) * fr, 2, attr(fr, "scaled:center"), "+")

which is numerically more stable (max 4.44 × 10-16) as it avoids adding of rounded floating point numbers f+r. Finally, you can avoid sweep if you do:

f <- fitted(m) # involves centring and scaling up
r <- residuals(m, type="working")
f + sqrt(nrow(r)-1) * r

This avoids de-centring twice, but you need to scale up the residuals.

@MalteThodberg
Copy link
Author

Super cool - speed is not an issue for me in this case, so this is a very elegant solution.

Is there a similar way to obtain the input data if rda was called with scale=TRUE?

@jarioksa
Copy link
Contributor

Yes, you can rebuild the input data after scale=TRUE. The result also saves argument scaled:scale which is a multiplier for variables (species) to get the original response scale. Have a look at the function fitted.rda to see how this is done.

@MalteThodberg
Copy link
Author

Resurrecting this, since I have a very similar problem with CCA!

based on @jarioksa's excellent response above, this is what I ended up using:

object <- rda(dune ~ Manure, dune.env)

# fitted and residuals
f <- stats::fitted(object) # involves centring and scaling up
r <- stats::residuals(object, type="working")

# Scale residuals if needed
scal <- attr(r, "scaled:scale")

# Scale and cetner
if (!is.null(scal)) {
	r <- sweep(r, 2, scal, "*")
}
# Rebuild
o <- f + sqrt(nrow(r)-1) * r

Then a PCA can be run on the rebuild original data for the corresponding unconstrained analysis:

# Redo PCA, scale if necessary
if(is.null(attr(object$Ybar,"scaled:scale"))){
	new_object <- rda(o, scale=FALSE)
}else{
	new_object <- rda(o, scale=TRUE)
}

Is it possible to do the same with a CCA object from cca()?

@jarioksa
Copy link
Contributor

jarioksa commented Oct 24, 2023

@MalteThodberg : yes, it is possible. It is even easier to reconstitute input data in CCA than for RDA.

There are many ways, depending on how close to primitives you want to go. Here a method that is based on predict function:

## Preliminaries for all examples
library(vegan)
data(dune, dune.env)
m <- cca(dune ~ Manure, dune.env)
## get marginal weights and matrix grand total
rc <- outer(m$rowsum, m$colsum) # outer product of marginal weights
tot <- m$grand.total

## Method 1: use predict
xbar <- predict(m, model="CCA", type="working") + predict(m, model="CA", type="working")
(xbar * sqrt(rc) + rc) * tot

If you want to get closer to the primitives, you can get each component of xbar from ordination scores as:

## Method 2: use result scores and eigenvalues
xbar_CCA <- m$CCA$u %*% diag(sqrt(m$CCA$eig)) %*% t(m$CCA$v)
xbar_CA <- m$CA$u %*% diag(sqrt(m$CA$eig)) %*% t(m$CA$v)
xbar <- xbar_CCA + xbar_CA
(xbar + 1) * rc * tot

Finally, if you only want to reconstruct the initial input and you do not want to go to the basics, the easiest way is to extract the internal form with ordiYbar and deweight:

## Method 3: Use internal working data
(ordiYbar(m, "initial") * sqrt(rc) + rc) * tot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants