Feature Request: Enable (Multivariate) Sampling from errors #29

billdenney · 2018-08-28T12:52:26Z

What do you think about making functions like the following that would allow sampling from an errors object?

rnorm_errors(), pnorm_errors(), dnorm_errors(), qnorm_errors() drawing from a normal distribution
rt_errors(), pt_errors(), dt_errors(), qt_errors() drawing from a T distribution
rmvrnorm_errors(), pmvrnorm_errors(), dmvrnorm_errors(), qmvrnorm_errors() drawing from a multivariate normal distribution
rmvt_errors(), pmvt_errors(), dmvt_errors(), qmvt_errors() drawing from a multivariate T distribution

These could all likely use the mvtnorm library or base R functions as a pretty simple wrapper.

The text was updated successfully, but these errors were encountered:

Enchufa2 · 2018-08-28T15:30:55Z

What do you mean by sampling from an errors object? And for what purpose?

billdenney · 2018-08-28T15:41:10Z

By sampling, I mean generate random samples from the distribution defined by the errors object. It defines a mean and standard error (or standard deviation).

I often generate linear or nonlinear models (e.g. nlme::nlme()) and may want to sample from their parameter distributions to simulate the model with uncertainty. (This is a very similar use case as #28.)

Enchufa2 · 2018-08-28T17:48:27Z

Something like the following?

library(errors)

x <- set_errors(1:10, 1:10*0.05)

mapply(rnorm, 5, x, errors(x))
#>           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
#> [1,] 0.9298000 1.904119 3.133269 4.370938 5.064903 6.012108 6.714207
#> [2,] 0.9731003 1.999266 3.029856 4.049640 5.410899 5.966489 6.496378
#> [3,] 0.9219417 1.803407 3.285455 3.780336 5.273075 5.825064 6.855276
#> [4,] 0.9981259 1.987456 3.178446 4.444459 5.058455 5.963717 7.161335
#> [5,] 1.0011889 1.988185 3.117229 4.031980 5.054416 5.904434 7.243851
#>          [,8]     [,9]     [,10]
#> [1,] 8.225904 8.805882  9.799824
#> [2,] 8.101031 8.624730 10.367774
#> [3,] 8.298141 8.566584  9.684903
#> [4,] 7.584533 9.322445  9.944816
#> [5,] 7.986738 8.805715 10.260714

billdenney · 2018-08-28T18:37:51Z

Just like that for the uncorrelated case. Perhaps the right way to do it is to just provide a single rnorm_errors() function that works as mvrnorm() if there is correlation or rnorm() if there is no correlation. That has the side benefit that the output is similar (a matrix) regardless of the input type.

Enchufa2 · 2018-08-28T19:09:49Z

Mmmh... I'm not sure if we are aligned with the dimensions here. Correlations are defined between two errors objects. The example above is univariate, because it's a single object.

If we call N the length of the errors object (different measurements of the same quantity), M the number of errors objects (different quantities), and K the number of draws required, the output would be a K x N x M array, in general. M=1 is the univariate case, as above, where K=5 and N=10.

billdenney · 2018-08-28T19:29:36Z

I just looked more closely at the documentation, and I apparently wasn't paying enough attention to the correl() and covar() functions. I had assumed that covar() was a general matrix covariance that could be assigned within elements of an errors object such that

x <- set_errors(1:3, 0.1*(1:3))

is equivalent to saying that the values of x are independent (uncorrelated).

x <- set_errors(1:3, diag(0.1*(1:3)))

But, I now see that is incorrect; set_errors() does not allow for matrix covariance matrices.

Do you have plans to support multivariate correlation matrices like the example above?

Enchufa2 · 2018-08-29T08:39:10Z

It does allow for covariance matrices, but these matrices must be decomposed in pairwise covariances between errors objects. See help("errors-package") for examples of this.

Elements of an errors vector are uncorrelated by design. The concept behind errors is the same as the one behind units objects, i.e., a vector represent different measurements of the same quantity (e.g., repeated measurements of the same value, the evolution of a value at different times...).

Let me illustrate this with an example. Let's say we measured the position x of an object at t=1,...,10 s:

library(errors)

x <- set_errors(1:10, 0.05)
t <- set_errors(1:10, 0.05)

Each measurement has some uncertainty (the same for all values here for simplicity). Now we can compute the distance covered in each interval, and then the instantaneous velocity, which is constant here:

dx <- diff(x)
dt <- diff(t)
(v <- dx/dt)
#> Errors: 0.1 0.1 0.1 0.1 0.1 ...
#> [1] 1 1 1 1 1 1 1 1 1

Obviously, there should be a strong correlation between the instantaneous velocity and the distance covered for each interval. And here it is:

correl(v, dx)
#> [1] 0.7071068 0.7071068 0.7071068 0.7071068 0.7071068 0.7071068 0.7071068 0.7071068 0.7071068

But measurements belonging to the same vector are conceptually uncorrelated by design. In the same way, a units vector has a single unit for all the values, and this is why the mixed_units feature required using lists.

We need to keep this aligned in order to orchestrate these packages in the quantities project. If we find strong use cases for in-vector correlations, a mixed_errors class, based on lists, could be implemented.

billdenney · 2018-08-29T09:52:11Z

Thank you for the explanation, and sorry that I missed it in the documentation.

I understand the design decision, and for my primary use cases, I need a covariance matrix like what you’re describing as mixed_errors.

When I have multiple measurements, I am often unable to assume that they are uncorrelated because there is usually some autoregressive correlation between the values. As an example, we’re i to take my blood glucose 10 times over the next 10 hours, the measurements will be relatively correlated, and the measurements taken adjacent to each other will be more correlated.

As I’m thinking of it more, I like your implementation as is, and I would love to see a mixed_errors class. I would hope that (at some point) the mixed_errors class could make use of efficient matrix operations for the correlation (and maybe that already happens under the surface).

Enchufa2 · 2018-08-29T10:16:36Z

The advantage of having different vectorised variables to operate freely with them without having to build an expression (as in the propagate package, for example) comes with a small disadvantage: if you have something complex like (a+b) * sin(x), R evaluates a+b, then sin(x) and finally the product of the results. This means that uncertainty is propagated in a step-by-step basis. However, this is done with a Taylor expansion (that is already expanded, it doesn't take derivatives each time), so it's fast (also, another issue is how to store vectors of covariances, but this is resolved internally with a hash table and automatic clean-up when variables are removed, and I think it's quite efficient). We could support Monte Carlo propagation at some point, but probably if someone needs this, the propagate package would be more appropriate, because it provides coverage intervals and so on.

I'll keep in mind this mixed_errors idea for future releases. First, we should release a new version of units to see if these mixed_units are well-received or require changes or adjustments. And my priority is to release an initial version of quantities soon.

As always, thanks for all the effort and feedback, very useful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Enable (Multivariate) Sampling from errors #29

Feature Request: Enable (Multivariate) Sampling from errors #29

billdenney commented Aug 28, 2018

Enchufa2 commented Aug 28, 2018

billdenney commented Aug 28, 2018

Enchufa2 commented Aug 28, 2018

billdenney commented Aug 28, 2018

Enchufa2 commented Aug 28, 2018

billdenney commented Aug 28, 2018

Enchufa2 commented Aug 29, 2018

billdenney commented Aug 29, 2018

Enchufa2 commented Aug 29, 2018 •

edited

Feature Request: Enable (Multivariate) Sampling from errors #29

Feature Request: Enable (Multivariate) Sampling from errors #29

Comments

billdenney commented Aug 28, 2018

Enchufa2 commented Aug 28, 2018

billdenney commented Aug 28, 2018

Enchufa2 commented Aug 28, 2018

billdenney commented Aug 28, 2018

Enchufa2 commented Aug 28, 2018

billdenney commented Aug 28, 2018

Enchufa2 commented Aug 29, 2018

billdenney commented Aug 29, 2018

Enchufa2 commented Aug 29, 2018 • edited

Enchufa2 commented Aug 29, 2018 •

edited