Question: Is it possible to rebuild the original input data from an rda object? #514

MalteThodberg · 2022-05-11T09:47:35Z

If i have an rda-object from the rda() function, it is possible to get out the orignial input data? Following the example from the rda() documentation:

# Fit RDA
data(dune)
data(dune.env)
dune.Manure <- rda(dune ~ Manure, dune.env)

# Extract fitted and residuals values
r <- residuals(dune.Manure)
f <- fitted(dune.Manure)

Now I would expect that r + f would give the original input dune, but that doesn't seem the be case:

# These are not the same!
head(dune)
head(f + r)

Is there a way a way to re-estimate the original input data from the rda-object?

The text was updated successfully, but these errors were encountered:

jarioksa · 2022-05-12T08:45:23Z

Yes, you can do it, but there is no direct vegan function to do so. Functions fitted and residuals return data structures in the scale of observations so that rda(fitted(rda(dune ~ Manure, dune.env))) returns the same constrained component as rda(dune ~ Manure, dune.env). For additive components you should ask for internal or "working" form instead of the default "response" form, and then manually change the result to the "response" form you are asking. How to do this, depends on the ordination method, and CCA and RDA do this differently. However, you may peak at the function vegan:::fitted.rda to see how this done there. The key lines are at the end:

        Xbar <- Xbar * sqrt(nrow(Xbar) - 1)
        Xbar <- sweep(Xbar, 2, cent, "+")

So you have to undo the scaling of data in RDA (which we do to be consistent with stats::prcomp and stats::princomp and give the eigenvalues in terms of variance instead of raw crossproducts). Key point for your problem was the centring, which in your example is done twice, both for the fitted and residual results, and hence the data are off by double-centring instead of centring once.

The following is the slow way:

library(vegan)
data(dune, dune.env)
m <- rda(dune ~ Manure, dune.env)
f <- fitted(m, type = "working")
r <- residuals(m, type = "working")
fr <- f+r
cnt <- attr(fr, "scaled:center")
fr <- sqrt(nrow(fr) - 1) * fr
range(sweep(fr, 2, cnt, "+") - dune)
# [1] -2.220446e-15  2.664535e-15

Floating point calculations are not numerically exact, but you got within 2.66 × 10^-15 of original values.

You can shortcut this by getting directly the sum of working matrices:

fr <- ordiYbar(m, "initial")
sweep(sqrt(nrow(fr)-1) * fr, 2, attr(fr, "scaled:center"), "+")

which is numerically more stable (max 4.44 × 10^-16) as it avoids adding of rounded floating point numbers f+r. Finally, you can avoid sweep if you do:

f <- fitted(m) # involves centring and scaling up
r <- residuals(m, type="working")
f + sqrt(nrow(r)-1) * r

This avoids de-centring twice, but you need to scale up the residuals.

MalteThodberg · 2022-05-14T18:42:23Z

Super cool - speed is not an issue for me in this case, so this is a very elegant solution.

Is there a similar way to obtain the input data if rda was called with scale=TRUE?

jarioksa · 2022-05-15T10:10:31Z

Yes, you can rebuild the input data after scale=TRUE. The result also saves argument scaled:scale which is a multiplier for variables (species) to get the original response scale. Have a look at the function fitted.rda to see how this is done.

MalteThodberg · 2023-10-20T11:11:48Z

Resurrecting this, since I have a very similar problem with CCA!

based on @jarioksa's excellent response above, this is what I ended up using:

object <- rda(dune ~ Manure, dune.env)

# fitted and residuals
f <- stats::fitted(object) # involves centring and scaling up
r <- stats::residuals(object, type="working")

# Scale residuals if needed
scal <- attr(r, "scaled:scale")

# Scale and cetner
if (!is.null(scal)) {
	r <- sweep(r, 2, scal, "*")
}
# Rebuild
o <- f + sqrt(nrow(r)-1) * r

Then a PCA can be run on the rebuild original data for the corresponding unconstrained analysis:

# Redo PCA, scale if necessary
if(is.null(attr(object$Ybar,"scaled:scale"))){
	new_object <- rda(o, scale=FALSE)
}else{
	new_object <- rda(o, scale=TRUE)
}

Is it possible to do the same with a CCA object from cca()?

jarioksa · 2023-10-24T11:51:02Z

@MalteThodberg : yes, it is possible. It is even easier to reconstitute input data in CCA than for RDA.

There are many ways, depending on how close to primitives you want to go. Here a method that is based on predict function:

## Preliminaries for all examples
library(vegan)
data(dune, dune.env)
m <- cca(dune ~ Manure, dune.env)
## get marginal weights and matrix grand total
rc <- outer(m$rowsum, m$colsum) # outer product of marginal weights
tot <- m$grand.total

## Method 1: use predict
xbar <- predict(m, model="CCA", type="working") + predict(m, model="CA", type="working")
(xbar * sqrt(rc) + rc) * tot

If you want to get closer to the primitives, you can get each component of xbar from ordination scores as:

## Method 2: use result scores and eigenvalues
xbar_CCA <- m$CCA$u %*% diag(sqrt(m$CCA$eig)) %*% t(m$CCA$v)
xbar_CA <- m$CA$u %*% diag(sqrt(m$CA$eig)) %*% t(m$CA$v)
xbar <- xbar_CCA + xbar_CA
(xbar + 1) * rc * tot

Finally, if you only want to reconstruct the initial input and you do not want to go to the basics, the easiest way is to extract the internal form with ordiYbar and deweight:

## Method 3: Use internal working data
(ordiYbar(m, "initial") * sqrt(rc) + rc) * tot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Is it possible to rebuild the original input data from an rda object? #514

Question: Is it possible to rebuild the original input data from an rda object? #514

MalteThodberg commented May 11, 2022

jarioksa commented May 12, 2022 •

edited

MalteThodberg commented May 14, 2022

jarioksa commented May 15, 2022

MalteThodberg commented Oct 20, 2023

jarioksa commented Oct 24, 2023 •

edited

Question: Is it possible to rebuild the original input data from an rda object? #514

Question: Is it possible to rebuild the original input data from an rda object? #514

Comments

MalteThodberg commented May 11, 2022

jarioksa commented May 12, 2022 • edited

MalteThodberg commented May 14, 2022

jarioksa commented May 15, 2022

MalteThodberg commented Oct 20, 2023

jarioksa commented Oct 24, 2023 • edited

jarioksa commented May 12, 2022 •

edited

jarioksa commented Oct 24, 2023 •

edited