Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in degenerate EM update for U #156

Open
ashaffer opened this issue Jan 27, 2022 · 12 comments
Open

Possible bug in degenerate EM update for U #156

ashaffer opened this issue Jan 27, 2022 · 12 comments
Assignees
Labels
bug actual bug question
Projects

Comments

@ashaffer
Copy link

I've been implementing some of the model described in [your paper[(https://cran.r-project.org/web/packages/MARSS/vignettes/EMDerivation.pdf) in Python, and to some extent using this library as a reference implementation. However, I recently ran into something that I can't seem to reconcile, and which either I just don't understand correctly from the paper, or is not quite right in the library's implementation.

Let's take a simple example, suppose we have an AR(2) model, like so:

B <- matrix(c(0.5 0.5, 1, 0), 2)
Q <- matrix(c(1, 0, 0, 0), 2)
tinitx <- 0
V0 <- matrix(c(0, 0, 0, 0), 2)
U <- matrix(c('u1', 'u2'), 2, 1)

My understanding from the paper is that the identifier matrix for determinism rows at t=0 should be the identity, and the library matches that. But at t=1 in this model, the second component of the state is still deterministic. In the library as implemented now it appears to treat the entire state vector as non-deterministic, which leads to an uninvertible denominator.

Sorry if i'm just misunderstanding something simple, as that's entirely possible.

@ashaffer ashaffer added the bug actual bug label Jan 27, 2022
@project-bot project-bot bot added this to New Issues in Milestones Jan 27, 2022
@eeholmes
Copy link
Collaborator

Hi, Great to hear that MARSS is getting translated to Python! I have had some inquiries on that but didn't have any bandwidth to work on it. I'll be keen to see how it works out.

Regarding the question. So think of the stochasticity as as a network. Is a state at time t stochastic (via Q at time t) or connected to a stochastic state via B. In your case, with a B(2,1) non-zero value, state 2 is always stochastic because it is connected to state 1. Note if you had a stochastic x0 (at t=1 or t=0), you have to look if there was a connection to that stochastic x also.

Keep in mind that the connection via B is not just t-1. You have to think of it as a network. In the derivation it talks about raising B to a power to find the network connections at time t+n. It can be rather slow to compute this "connection" matrix (B raised to a power) to find which states with Q=0 are actually stochastic via a connection to a stochastic state via B.

That said it is possible there is a bug somewhere. So definitely keep me updated on what you find (with code to replicate) and I'll dig into it.

@eeholmes eeholmes self-assigned this Jan 27, 2022
@ashaffer
Copy link
Author

ashaffer commented Jan 28, 2022

Great, i'll make sure to pass it along if I get my Python version into a publishable state.

So, the issue i'm concerned with is at t=1 in the model I described. Certainly at t=2 and beyond both states are non-deterministic. My understanding from reading the paper is that at:

  • t = 0: all states are deterministic
  • t = 1: the first state is directly stochastic, the second state is deterministic,
  • t = 2 - n: state 1 is directly stochastic, state 2 is indirectly stochastic, and this is now the steady state

What instead I think is happening in the library is that at this line the IId matrix is being updated to its t=2 form, but then at this subsequent line it's assuming it's still in t=1 form. Essentially it seems like an off by one, where IId is getting updated one time too many early on.

I tried making a couple of changes that do it the way I was imagining it, which you can see here:

ashaffer@62c8744

Here's an example of something that doesn't work without the patch:

control = list(maxit=100, minit=1, trace = 1)
model <- list(
    B = matrix(list(0.5, 1, 0.5, 0), nrow=2), 
    U = matrix(list("u1", "u2"), 2, 1), 
    Q = matrix(c(1, 0, 0, 0), nrow=2), 
    A = matrix(c(0)), 
    Z = matrix(1, 1, 2), 
    R = diag(1), 
    x0 = matrix(c(0, 0)), 
    tinitx = 0
)

rows = 200
dim = 1
y = matrix(1:rows*dim, dim, rows)

fit.0 <- MARSS(y, model = model, control=control, fun.kf="MARSSkfss")

I'm just using a such a trivial dataset because I wanted something easy to generate identically in Python, so I could compare my EM updates to yours. The model isn't intended to mean anything or represent anything useful, but without the patch this fails to fit with a "denominv" error.

I'm just learning all of this myself though so sorry if i'm just not understanding it correctly.

@eeholmes
Copy link
Collaborator

Thanks for the example code and new explanation! I will dive into it next week.

@eeholmes
Copy link
Collaborator

Goodness the degenerate section of EMDerivation.pdf is exceedingly confusing and has quite a few typos. I am still aways away from determining if what you write is a bug or not.

However, note that the issue of u not being estimable is not related to this "bug" (if it is indeed a bug). It has to do with the u rows associated with indirectly stochastic x not being estimable. See section 7.2.2 in EMDerivation.pdf and the last paragraph in that section. It is a constraint arising from the partial differential of the expected log-likelihood not being possible. I will ponder more how to better explain why the partial differentiation wrt u^is is not possible. But read the end of paragraph 2 in section 7.2.2 and maybe you'll see why. If x_t = B x_t-1 + u for the indirectly stochastic x, then it is impossible to hold x_t and x_t-1 constant while varying u, which is what has to be done for partial differentiation in the M step of the EM algorithm. For the stochastic x, x_t = B x_t-1 + u + w_t. So it is possible since w_t is an error term.

@ashaffer
Copy link
Author

Ah yes you're totally right about that. The constraint section for the degenerate variance condition is the section I probably struggled to parse the most. I found most of the paper to be really well and clearly written though, really appreciated e.g. the primer on matrix calculus and similar things peppered throughout. The level of detail/clarity was unusually good for things like this at least in my experience.

@eeholmes
Copy link
Collaborator

So, yes it's a bug. The line you identified is correct for t>(tinitx+1) (which is going to be t=2 or t=3 depending on tinitx) but not correct for tinitx+1 (so t=1 for tinitx=0 and t=2 for tinitx=1). The M matrix (the matrix that shows where connectivity) needs to be different for tinitx+1 since x at tinitx could be deterministic (as it is in the model you specified).

I'll have to think a bit how to fix this so it works correctly for all V0 matrices.

@eeholmes
Copy link
Collaborator

eeholmes commented Feb 1, 2022

I have implemented a correction to this bug on the 'degenfix' branch. Thanks for alerting me to this!

With the fix, your code runs. It shouldn't run. I should block this model in MARSSkemcheck as part of the degenerate model constraints. But I am not going to do that right away. I want to revisit my logic for why estimation of u associated with indirectly stochastic x (so in this case u2) should be impossible. Your model with

U = matrix(list("u1", "u2"), 2, 1)

should definitely be blocked by MARSSkemcheck. But I am not sure about

U = matrix(list("u1", "u1"), 2, 1)

I am also updating the EMDerivation section, though it will still be very very dense. The derivation is tedious and I have yet to figure out a clear way to write it out.

@ashaffer
Copy link
Author

ashaffer commented Feb 1, 2022

Great, thanks for the quick turnaround. Glad I didn't end up wasting your time with this. I should have mentioned it before, but I think the same issue might be present in the EM update for x0 too.

Re: the degenerate variance section, one little thing that I still don't entirely understand is the notation in equation 204 for delta 4, at t=0. It's expressed as:

0_mxm D_1,u

This is actually what led me to check out the code for the library in the first place. I see that in the library it just initializes it to Du, which makes sense, but I still don't quite comprehend what the 0_mxm part of the notation is trying to express, and the same goes for delta 3.

Overall though the clarity of the derivation is exceptional, i've implemented almost all of it directly from the paper over the last month or so. In most cases where I struggled for a while it ended up being because I hadn't read the prior sections as thoroughly as I should have and going back and re-reading cleared things up. It's complicated in some places, but the underlying subject matter is complicated, so there's only so much you can do to make it simple.

@eeholmes
Copy link
Collaborator

eeholmes commented Feb 1, 2022

Ah, yes same issue is in the x0 update, but I fear there is a problem somewhere. Adding a NA causes the LL to drop.

y = matrix(1:rows*dim*2, dim, rows); y[,1]<-NA

The estimates are fine. Compare:

control = list(maxit=500, minit=500, trace = 1)
y = matrix(1:rows*dim*2, dim, rows); y[,1]<-NA
fit.0 <- MARSS(y, model = model, control=control)
# iter=3 LogLike DROPPED.  old=-297.88307358561 new=-297.890748299677
fit.0 <- MARSS(y, model = model, method="BFGS")

Back to the drawing board. Something subtle is wrong.

@ashaffer
Copy link
Author

ashaffer commented Feb 1, 2022

What's the model being used here?

@eeholmes
Copy link
Collaborator

eeholmes commented Feb 1, 2022

Same one as you posted.

model <- list(
    B = matrix(list(0.5, 1, 0.5, 0), nrow=2), 
    U = matrix(list("u1", "u2"), 2, 1), 
    Q = matrix(c(1, 0, 0, 0), nrow=2), 
    A = matrix(c(0)), 
    Z = matrix(1, 1, 2), 
    R = diag(1), 
    x0 = matrix(c(0, 0)), 
    tinitx = 0
)

I think this is related to the issue of estimated u associated with indirectly stochastic x rows, so in this case u2. I could just block via MARSSkemcheck, but curiously enough the one deterministic step from t=0 to t=1 should allow estimation. Not much information but should work. So there is something subtle amiss in the derivation.

I'll dive into again later today but it will probably take a few days to sort out.

@ashaffer
Copy link
Author

ashaffer commented Feb 3, 2022

I might have noticed another small inconsistency in the calculation of x0:

https://github.com/atsa-es/MARSS/blob/degenfix/R/MARSSkem.r#L491

Compared to equation 209 in the derivation, the numerator seems to be missing a bold-L/star$V0 factor in its calculation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug actual bug question
Projects
Milestones
New Issues
Development

No branches or pull requests

2 participants