Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector format in defaults #188

Open
jfree-man opened this issue Mar 19, 2024 · 8 comments
Open

vector format in defaults #188

jfree-man opened this issue Mar 19, 2024 · 8 comments

Comments

@jfree-man
Copy link
Collaborator

I think we need a better way to pass vectors as defaults. Can we have an mp_vector(y,labels) function, that works similar to mp_zero_vector for a numeric vector y and optional character vector of labels labels if we want to give y names or not. This function would return the vector in the correct format for the simulator.
A first start would be to better document the types of objects that can be passed as defaults, because I think it's likely other users may need to read in defaults from external files, and getting these vectors into the right format for the simulator isn't obvious.

The hypothetical mp_vector function has optional labels because:

  • Sometimes we want the vector passed to the default list to have names(). Ex. if the specified model uses a state vector and individual state names in expressions. The object class of the vector, as well as attributes (names and dimnames) seem to determine whether the simulator will error or not. See here for some experiments/motivation for this.
  • Sometimes we don't need the vector passed to the default list to have names(). Ex. for the NFDS model, we read in an initial state vector from a csv file. This vector doesn't need labels because the state vector is ordered. I used as.matrix() and unname() to get the vector in an acceptable format for the simulator.
@stevencarlislewalker
Copy link
Member

I'm thinking about this stuff now. In the meantime, I feel like just passing ordinary numeric matrices either with or without names should be sufficient. The standard R setNames function could be useful to get the functionality that you want. Does this make sense?

@jfree-man
Copy link
Collaborator Author

Makes sense, not an urgent concern because I can work around this. I just tested using setNames on a matrix, and it doesn't seem to work...

setNames(state_matrix, names(sir_defaults))

I'm thinking it's common for people to work with data.frames especially since this is the return value for read.table. So I'm wondering how we can communicate the format the table needs to be in. I'll think on this.

@stevencarlislewalker
Copy link
Member

stevencarlislewalker commented Mar 19, 2024

Thanks @jfree-man. I'll think as well. Please put any notes you might have here. Thank you!

@bbolker
Copy link
Collaborator

bbolker commented Mar 19, 2024

setNames indeed doesn't work on matrices. Weirdly, matrices have "colnames", not "names". We could certainly provide a helper function ...

setColnames <- function(x, nm) {
    colnames(x) <- nm
    return(x)
}

Does that help?

@jfree-man
Copy link
Collaborator Author

It helps to know this information. So we can't use matrices with colnames (at the moment), because I think the simulator is expecting a names argument. We can pass unnamed matrices.

@jfree-man
Copy link
Collaborator Author

jfree-man commented Mar 28, 2024

When comparing the high and low-level calibration interfaces with different vector defaults,

  1. The high-level interface needs to be updated to allow vector defaults in addition to scalars TMBPar here
  2. After 1. has been addressed (commented out), I tested 4 different vector defaults scenarios to try to estimate:
    1. un-named numeric vector - Low and high-level interfaces both converge on optimization, number of iterations are the same or similar, and estimates/SE are similar.
    > # compare convergence & iterations
    > opt_low$converge == opt_high$converge 
    [1] TRUE
    > opt_low$iterations == opt_high$iterations
    [1] TRUE
    > # compare estimates
    > mp_tmb_coef(cal_low, conf.int=TRUE)
    outer mgc:  6.801212e-10 
    outer mgc:  0.01737385 
    outer mgc:  0.01740903 
    outer mgc:  0.006632762 
    outer mgc:  0.006633656 
    outer mgc:  0.7287053 
          term   mat row col default  type estimate std.error conf.low  conf.high
    1   params theta   0   0     0.2 fixed  1.79717  1.506615 -1.15574   4.750081
    2 params.1 theta   1   0    50.0 fixed 92.65136  3.899803 85.00789 100.294835
    > mp_tmb_coef(cal_high, conf.int=TRUE)
    outer mgc:  5.812219e-11 
    outer mgc:  0.02927333 
    outer mgc:  0.02933655 
    outer mgc:  0.008285214 
    outer mgc:  0.008286417 
    outer mgc:  3.467646 
          term   mat row col default  type  estimate std.error  conf.low  conf.high
    1   params theta   0   0     0.2 fixed  1.555729  1.335973 -1.062731   4.174188
    2 params.1 theta   1   0    50.0 fixed 91.781919  4.679631 82.610010 100.953828
    
    1. named numeric vector - At optimization step (mp_optimize), low-level converges and high-level errors with 'arguments imply differing number of rows: 2, 0'.
    > opt_low$converge
    [1] 0
    > opt_low$iterations
    [1] 23
    > mp_tmb_coef(cal_low, conf.int=TRUE)
    outer mgc:  1.69503e-05 
    outer mgc:  10.65904 
    outer mgc:  10.5144 
    outer mgc:  0.02968575 
    outer mgc:  0.02965414 
    outer mgc:  20.64763 
          term   mat row col default  type   estimate  std.error     conf.low  conf.high
    1   params theta   0   0     0.2 fixed  0.1640094   2.151371    -4.052601    4.38062
    2 params.1 theta   1   0    50.0 fixed 38.7778367 767.611547 -1465.713149 1543.26882
    > opt_high = mp_optimize(cal_high)
    outer mgc:  27.15896 
    outer mgc:  1.062276 
    outer mgc:  0.2386499 
    outer mgc:  0.2036293 
    outer mgc:  0.8310266 
    outer mgc:  0.704181 
    outer mgc:  2.347819 
    outer mgc:  0.1947629 
    outer mgc:  0.253889 
    outer mgc:  1.025382 
    outer mgc:  0.834913 
    outer mgc:  0.828891 
    outer mgc:  0.9898317 
    outer mgc:  0.6263391 
    outer mgc:  0.6314087 
    outer mgc:  0.5386188 
    outer mgc:  0.5354343 
    outer mgc:  1.208925 
    outer mgc:  0.2193686 
    outer mgc:  0.2406963 
    outer mgc:  0.9735105 
    outer mgc:  0.2260358 
    outer mgc:  0.2410839 
    outer mgc:  0.2385732 
    outer mgc:  0.2364942 
    outer mgc:  0.2344428 
    outer mgc:  0.2324271 
    outer mgc:  0.230446 
    outer mgc:  0.2284991 
    outer mgc:  0.2265858 
    outer mgc:  0.8440269 
    outer mgc:  0.05355716 
    outer mgc:  0.06191667 
    outer mgc:  0.2488219 
    outer mgc:  0.2860194 
    outer mgc:  0.1565505 
    outer mgc:  0.1576296 
    outer mgc:  0.1565257 
    outer mgc:  0.1218892 
    outer mgc:  0.09986208 
    outer mgc:  0.06530448 
    outer mgc:  0.04810544 
    outer mgc:  0.02093714 
    outer mgc:  0.009328867 
    outer mgc:  0.001122002 
    Error in data.frame(row = row, col = col, value = as.vector(x)) : 
       arguments imply differing number of rows: 2, 0
    
    1. named numeric data.frame (row vector) - The model specification default table gets wonky (creates multiple 'value' columns).
    > print(spec)
    ---------------------
    Default values:
    ---------------------
     matrix row       col value.beta value.R_initial value
      theta   1      beta        0.2               0      
      theta   1 R_initial        0.2               0      
      gamma                                            0.1
          N                                            100
          I                                              1
    
    ---------------------
    Before the simulation loop (t = 0):
    ---------------------
    1: R ~ theta[R_initial_ind]
    2: S ~ N - I - R
    
    ---------------------
    At every iteration of the simulation loop (t = 1 to T):
    ---------------------
    1: infection ~ S * I * theta[beta_ind]/N
    2: recovery ~ gamma * I
    3: S ~ S - infection
    4: I ~ I + infection - recovery
    5: R ~ R + recovery
    
    1. un-named numeric data.frame (col vector) - When you attempt to print the model specification you get the same error 'arguments imply differing number of rows: 2, 0'
    > print(spec)
    ---------------------
    Default values:
    ---------------------
    Error in data.frame(row = row, col = col, value = as.vector(x)) : 
      arguments imply differing number of rows: 2, 0
    

To reproduce these results,

  1. Comment out lines in TMBPar as above and rebuild package
  2. Choose a default vector format by commenting out appropriate lines here
  3. Run testing_calibration_interface.R

@stevencarlislewalker
Copy link
Member

Thanks! Would you mind pasting outputs from example results in 2?

@stevencarlislewalker
Copy link
Member

Thanks Jen. I didn't even think about 2iii, and 2iv. But they are reasonable so thanks for adding them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants