Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Enable (Multivariate) Sampling from errors #29

Open
billdenney opened this issue Aug 28, 2018 · 9 comments
Open

Feature Request: Enable (Multivariate) Sampling from errors #29

billdenney opened this issue Aug 28, 2018 · 9 comments

Comments

@billdenney
Copy link

What do you think about making functions like the following that would allow sampling from an errors object?

  • rnorm_errors(), pnorm_errors(), dnorm_errors(), qnorm_errors() drawing from a normal distribution
  • rt_errors(), pt_errors(), dt_errors(), qt_errors() drawing from a T distribution
  • rmvrnorm_errors(), pmvrnorm_errors(), dmvrnorm_errors(), qmvrnorm_errors() drawing from a multivariate normal distribution
  • rmvt_errors(), pmvt_errors(), dmvt_errors(), qmvt_errors() drawing from a multivariate T distribution

These could all likely use the mvtnorm library or base R functions as a pretty simple wrapper.

@Enchufa2
Copy link
Member

What do you mean by sampling from an errors object? And for what purpose?

@billdenney
Copy link
Author

By sampling, I mean generate random samples from the distribution defined by the errors object. It defines a mean and standard error (or standard deviation).

I often generate linear or nonlinear models (e.g. nlme::nlme()) and may want to sample from their parameter distributions to simulate the model with uncertainty. (This is a very similar use case as #28.)

@Enchufa2
Copy link
Member

Something like the following?

library(errors)

x <- set_errors(1:10, 1:10*0.05)

mapply(rnorm, 5, x, errors(x))
#>           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
#> [1,] 0.9298000 1.904119 3.133269 4.370938 5.064903 6.012108 6.714207
#> [2,] 0.9731003 1.999266 3.029856 4.049640 5.410899 5.966489 6.496378
#> [3,] 0.9219417 1.803407 3.285455 3.780336 5.273075 5.825064 6.855276
#> [4,] 0.9981259 1.987456 3.178446 4.444459 5.058455 5.963717 7.161335
#> [5,] 1.0011889 1.988185 3.117229 4.031980 5.054416 5.904434 7.243851
#>          [,8]     [,9]     [,10]
#> [1,] 8.225904 8.805882  9.799824
#> [2,] 8.101031 8.624730 10.367774
#> [3,] 8.298141 8.566584  9.684903
#> [4,] 7.584533 9.322445  9.944816
#> [5,] 7.986738 8.805715 10.260714

@billdenney
Copy link
Author

Just like that for the uncorrelated case. Perhaps the right way to do it is to just provide a single rnorm_errors() function that works as mvrnorm() if there is correlation or rnorm() if there is no correlation. That has the side benefit that the output is similar (a matrix) regardless of the input type.

@Enchufa2
Copy link
Member

Mmmh... I'm not sure if we are aligned with the dimensions here. Correlations are defined between two errors objects. The example above is univariate, because it's a single object.

If we call N the length of the errors object (different measurements of the same quantity), M the number of errors objects (different quantities), and K the number of draws required, the output would be a K x N x M array, in general. M=1 is the univariate case, as above, where K=5 and N=10.

@billdenney
Copy link
Author

I just looked more closely at the documentation, and I apparently wasn't paying enough attention to the correl() and covar() functions. I had assumed that covar() was a general matrix covariance that could be assigned within elements of an errors object such that

x <- set_errors(1:3, 0.1*(1:3))

is equivalent to saying that the values of x are independent (uncorrelated).

x <- set_errors(1:3, diag(0.1*(1:3)))

But, I now see that is incorrect; set_errors() does not allow for matrix covariance matrices.

Do you have plans to support multivariate correlation matrices like the example above?

@Enchufa2
Copy link
Member

It does allow for covariance matrices, but these matrices must be decomposed in pairwise covariances between errors objects. See help("errors-package") for examples of this.

Elements of an errors vector are uncorrelated by design. The concept behind errors is the same as the one behind units objects, i.e., a vector represent different measurements of the same quantity (e.g., repeated measurements of the same value, the evolution of a value at different times...).

Let me illustrate this with an example. Let's say we measured the position x of an object at t=1,...,10 s:

library(errors)

x <- set_errors(1:10, 0.05)
t <- set_errors(1:10, 0.05)

Each measurement has some uncertainty (the same for all values here for simplicity). Now we can compute the distance covered in each interval, and then the instantaneous velocity, which is constant here:

dx <- diff(x)
dt <- diff(t)
(v <- dx/dt)
#> Errors: 0.1 0.1 0.1 0.1 0.1 ...
#> [1] 1 1 1 1 1 1 1 1 1

Obviously, there should be a strong correlation between the instantaneous velocity and the distance covered for each interval. And here it is:

correl(v, dx)
#> [1] 0.7071068 0.7071068 0.7071068 0.7071068 0.7071068 0.7071068 0.7071068 0.7071068 0.7071068

But measurements belonging to the same vector are conceptually uncorrelated by design. In the same way, a units vector has a single unit for all the values, and this is why the mixed_units feature required using lists.

We need to keep this aligned in order to orchestrate these packages in the quantities project. If we find strong use cases for in-vector correlations, a mixed_errors class, based on lists, could be implemented.

@billdenney
Copy link
Author

Thank you for the explanation, and sorry that I missed it in the documentation.

I understand the design decision, and for my primary use cases, I need a covariance matrix like what you’re describing as mixed_errors.

When I have multiple measurements, I am often unable to assume that they are uncorrelated because there is usually some autoregressive correlation between the values. As an example, we’re i to take my blood glucose 10 times over the next 10 hours, the measurements will be relatively correlated, and the measurements taken adjacent to each other will be more correlated.

As I’m thinking of it more, I like your implementation as is, and I would love to see a mixed_errors class. I would hope that (at some point) the mixed_errors class could make use of efficient matrix operations for the correlation (and maybe that already happens under the surface).

@Enchufa2
Copy link
Member

Enchufa2 commented Aug 29, 2018

The advantage of having different vectorised variables to operate freely with them without having to build an expression (as in the propagate package, for example) comes with a small disadvantage: if you have something complex like (a+b) * sin(x), R evaluates a+b, then sin(x) and finally the product of the results. This means that uncertainty is propagated in a step-by-step basis. However, this is done with a Taylor expansion (that is already expanded, it doesn't take derivatives each time), so it's fast (also, another issue is how to store vectors of covariances, but this is resolved internally with a hash table and automatic clean-up when variables are removed, and I think it's quite efficient). We could support Monte Carlo propagation at some point, but probably if someone needs this, the propagate package would be more appropriate, because it provides coverage intervals and so on.

I'll keep in mind this mixed_errors idea for future releases. First, we should release a new version of units to see if these mixed_units are well-received or require changes or adjustments. And my priority is to release an initial version of quantities soon.

As always, thanks for all the effort and feedback, very useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants