NA/NaN gradient evaluation #383

Blevy2 · 2024-01-31T19:04:40Z

Hi Jim!

I am running into an issue that I am having trouble understanding that I wanted to run by you.

I am running VAST models on samples taken from spatial population simulation output for fish species that I have developed. In the spatial population models, the probability a fish moves to discrete cell $(i,j)$ in week $w$ is given by probability $Move_{w,i,j}$, where $Move_{w,i,j}$ depends on factors such as the water temperature in the given cell, $Temp_{w,i,j}$.

I want to compare the performance of VAST models with and without covariates, so I could reasonably provide VAST with either $Move_{w,i,j}$ and/or $Temp_{w,i,j}$ as the covariate. Since $Temp_{w,i,j}$ is just one component of the actual movement probability $Move_{w,i,j}$, my assumption is that $Move_{w,i,j}$ would provide more information to VAST about species spatial preferences and thus provide a more accurate estimate compared to $Temp_{w,i,j}$. I am using a second degree polynomial response when including covariates. For example, if using $Move$ as the covariate I input

        X2_formula =  ~ poly(Move, degree=2 ) 
        X1_formula =  ~ poly(Move, degree=2 )

My models without covariates are all converging fine and my models that use $Temp_{w,i,j}$ as the covariate are also converging, but models that use $Move_{w,i,j}$ as the covariate all seem to run nearly to completion before they produce the error

        <simpleError in nlminb(start = startpar, objective = fn, gradient = gr, control = nlminb.control,     lower = lower, upper = upper): NA/NaN gradient evaluation>

There are a few github issues for VAST that involve NA/NaN function evaluation, but not NA/NaN gradient evaluation. The closest thing I could find related to this is this issue thread in glmmTMB: glmmTMB/glmmTMB#164

Based on the discussion in the above issue link our best guess is that maybe the gradient function created in VAST has a term something like log(exp(X)) where X is some parameter. During the final stage of a VAST run, possibly when the final Hessian is being calculated, the X value gets really really small (e.g., 1E-320) and the exp(X) goes to zero due to computer rounding and thus the log(exp(X)) is undefined. Does that sound reasonable?

Are you familiar with this error? Do you know how to fix this so we can use $Move$ as a covariate?

Here is my Seesion.Info():

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dplyr_1.1.3 VAST_3.10.1 FishStatsUtils_2.12.1
[4] marginaleffects_0.15.1 units_0.8-4 TMB_1.9.6

loaded via a namespace (and not attached):
[1] utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0 Matrix_1.6-1.1
[5] lattice_0.20-44 magrittr_2.0.3 INLA_23.05.30-1 splines_4.3.2
[9] glue_1.6.2 tibble_3.2.1 pkgconfig_2.0.3 generics_0.1.2
[13] lifecycle_1.0.4 cli_3.6.1 fansi_1.0.5 vctrs_0.6.4
[17] grid_4.3.2 data.table_1.14.4 compiler_4.3.2 sp_1.5-0
[21] pillar_1.9.0 Rcpp_1.0.11 rlang_1.1.2
[1] "input is"

Thanks for your help!

Ben

The text was updated successfully, but these errors were encountered:

James-Thorson · 2024-01-31T19:12:58Z

Ben, I'm guessing that you have a mismatch between Matrix and TMB. Have you confirmed that TMB is working using the example here ( https://github.com/pfmc-assessments/geostatistical_delta-GLMM/wiki/Steps-to-install-TMB) ...? Also, feel free to email me about tinyVAST ( https://vast-lib.github.io/tinyVAST/). I'm maintaining VAST for the next couple years, but my development efforts are focused on tinyVAST which has similar functionality using a smaller code base and regression interface. Jim

…

On Wed, Jan 31, 2024 at 11:04 AM Blevy2 ***@***.***> wrote: Hi Jim! I am running into an issue that I am having trouble understanding that I wanted to run by you. I am running VAST models on samples taken from spatial population simulation output for fish species that I have developed. In the spatial population models, the probability a fish moves to discrete cell $(i,j)$ in week $w$ is given by probability $Move_{w,i,j}$, where $Move_{w,i,j}$ depends on factors such as the water temperature in the given cell, $Temp_{w,i,j}$. I want to compare the performance of VAST models with and without covariates, so I could reasonably provide VAST with either $Move_{w,i,j}$ and/or $Temp_{w,i,j}$ as the covariate. Since $Temp_{w,i,j}$ is just one component of the actual movement probability $Move_{w,i,j}$, my assumption is that $Move_{w,i,j}$ would provide more information to VAST about species spatial preferences and thus provide a more accurate estimate compared to $Temp_{w,i,j}$. I am using a second degree polynomial response when including covariates. For example, if using $Move$ as the covariate I input X2_formula = ~ poly(Move, degree=2 ) X1_formula = ~ poly(Move, degree=2 ) My models without covariates are all converging fine and my models that use $Temp_{w,i,j}$ as the covariate are also converging, but models that use $Move_{w,i,j}$ as the covariate all seem to run nearly to completion before they produce the error <simpleError in nlminb(start = startpar, objective = fn, gradient = gr, control = nlminb.control, lower = lower, upper = upper): NA/NaN gradient evaluation> There are a few github issues for VAST that involve NA/NaN *function* evaluation, but not NA/NaN *gradient* evaluation. The closest thing I could find related to this is this issue thread in glmmTMB: glmmTMB/glmmTMB#164 <glmmTMB/glmmTMB#164> Based on the discussion in the above issue link our best guess is that maybe the gradient function created in VAST has a term something like log(exp(X)) where X is some parameter. During the final stage of a VAST run, possibly when the final Hessian is being calculated, the X value gets really really small (e.g., 1E-320) and the exp(X) goes to zero due to computer rounding and thus the log(exp(X)) is undefined. Does that sound reasonable? Are you familiar with this error? Do you know how to fix this so we can use $Move$ as a covariate? Thanks for your help! Ben — Reply to this email directly, view it on GitHub <#383>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB46UTIAKL5M2WUXRTWHHQTYRKIVLAVCNFSM6AAAAABCTS7YBKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTANZXGU2DSNA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Blevy2 · 2024-01-31T19:23:04Z

Hi Jim,

I just ran the code in the install TMB link and confirmed that TMB is working.

Do you think I need a different version of the Matrix package?

My sessionInfo shows Matrix_1.6-1.1 and TMB_1.9.6

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dplyr_1.1.3 VAST_3.10.1 FishStatsUtils_2.12.1
[4] marginaleffects_0.15.1 units_0.8-4 TMB_1.9.6

loaded via a namespace (and not attached):
[1] utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0 Matrix_1.6-1.1
[5] lattice_0.20-44 magrittr_2.0.3 INLA_23.05.30-1 splines_4.3.2
[9] glue_1.6.2 tibble_3.2.1 pkgconfig_2.0.3 generics_0.1.2
[13] lifecycle_1.0.4 cli_3.6.1 fansi_1.0.5 vctrs_0.6.4
[17] grid_4.3.2 data.table_1.14.4 compiler_4.3.2 sp_1.5-0
[21] pillar_1.9.0 Rcpp_1.0.11 rlang_1.1.2
[1] "input is"

Thanks for your help!

Ben

James-Thorson · 2024-01-31T19:24:12Z

When you start a new session and type library(TMB) do you get a warning message?

…

On Wed, Jan 31, 2024 at 11:23 AM Blevy2 ***@***.***> wrote: Hi Jim, I just ran the code in the install TMB link and confirmed that TMB is working. Do you think I need a different version of the Matrix package? My sessionInfo shows Matrix_1.6-1.1 and TMB_1.9.6 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] dplyr_1.1.3 VAST_3.10.1 FishStatsUtils_2.12.1 [4] marginaleffects_0.15.1 units_0.8-4 TMB_1.9.6 loaded via a namespace (and not attached): [1] utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0 Matrix_1.6-1.1 [5] lattice_0.20-44 magrittr_2.0.3 INLA_23.05.30-1 splines_4.3.2 [9] glue_1.6.2 tibble_3.2.1 pkgconfig_2.0.3 generics_0.1.2 [13] lifecycle_1.0.4 cli_3.6.1 fansi_1.0.5 vctrs_0.6.4 [17] grid_4.3.2 data.table_1.14.4 compiler_4.3.2 sp_1.5-0 [21] pillar_1.9.0 Rcpp_1.0.11 rlang_1.1.2 [1] "input is" Thanks for your help! Ben — Reply to this email directly, view it on GitHub <#383 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB46UTJMB2HDJAIVLW7BMZDYRKK2JAVCNFSM6AAAAABCTS7YBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJZG43TQNJQGQ> . You are receiving this because you commented.Message ID: ***@***.***>

Blevy2 · 2024-01-31T19:32:38Z

Hi Jim,

To clarify, this is being run in a High Performance Computing environment (HPC), which means I am running command line inputs that call R files and then looking at .out files.

I don't think there is a warning when library(TMB) is called, but I just searched the outfile for "warning" and found the following:

In addition: Warning messages:
1: In checkMatrixPackageVersion() :
Package version inconsistency detected.
TMB was built with Matrix version 1.5.4.1
Current Matrix version is 1.6.1.1
Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
2: In dir.create(file.path(paste0(getwd(), "/sim_", sim_num, "/", CN, :
'/mnt/research/b.levy/VAST_Stuff/VAST_MixFishSim/10xKnots_MFS/sim_1/YTF/ConTemp/WCov_MoveCov/FALL/AllStrata' already exists
3: In file(file, "rt") :
cannot open file 'Index_wYearSeason.csv': No such file or directory

Do you agree that this implies I should have Matrix 1.5.4.1 instead of 1.6.1.1? I just want to be clear because this is a High Performance Computing environment so I need to communicate with the HPC administrator to coordinate package installations/changes.

If the Matrix package is off, any idea why I have never had a problem running models on the HPC previously? I have run thousands of models over the last few months on the HPC and only this one covariate type is causing an issue. I just want to try to understand the problem.

thanks Jim!

Ben

James-Thorson · 2024-01-31T19:45:24Z

There's a huge number of issue threads about the Matrix / TMB mismatch in glmmTMB, sdmTMB etc. It's been a headache :0 I'm guessing that TMB got updated and now the version mismatch is causing some problem with sparse-matrix stuff. Sorry that the HPC stuff is hard to debug! I'm hoping that this Matrix/TMB issue doesn't happen again.

…

On Wed, Jan 31, 2024 at 11:32 AM Blevy2 ***@***.***> wrote: Hi Jim, To clarify, this is being run in a High Performance Computing environment (HPC), which means I am running command line inputs that call R files and then looking at .out files. I don't think there is a warning when library(TMB) is called, but I just searched the outfile for "warning" and found the following: In addition: Warning messages: 1: In checkMatrixPackageVersion() : Package version inconsistency detected. TMB was built with Matrix version 1.5.4.1 Current Matrix version is 1.6.1.1 Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package 2: In dir.create(file.path(paste0(getwd(), "/sim_", sim_num, "/", CN, : '/mnt/research/b.levy/VAST_Stuff/VAST_MixFishSim/10xKnots_MFS/sim_1/YTF/ConTemp/WCov_MoveCov/FALL/AllStrata' already exists 3: In file(file, "rt") : cannot open file 'Index_wYearSeason.csv': No such file or directory Do you agree that this implies I should have Matrix 1.5.4.1 instead of 1.6.1.1? I just want to be clear because this is a High Performance Computing environment so I need to communicate with the HPC administrator to coordinate package installations/changes. If the Matrix package is off, any idea why I have never had a problem running models on the HPC previously? I have run thousands of models over the last few months on the HPC and only this one covariate type is causing an issue. I just want to try to understand the problem. thanks Jim! Ben — Reply to this email directly, view it on GitHub <#383 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB46UTOU2KPQW2NRXUV3533YRKL6HAVCNFSM6AAAAABCTS7YBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJZG44TGNRUGQ> . You are receiving this because you commented.Message ID: ***@***.***>

Blevy2 · 2024-02-05T17:34:15Z

To follow up on this issue, I worked with the HPC administrator to try two different combinations of Matrix and TMB that were shown in some of the issues related to this problem. Unfortunately they both produced the same error.

Next I then tried a different covariate combination in the model and the model also did not converge. In this case I was using both a static ($Hab_{i,j}$) and dynamic covariate ($Temp_{wk,i,j}$) with the following response:

    X1_formula =  ~ poly(Temp, degree=2 ) + poly(Hab, degree=2 ) 
    X2_formula =  ~ poly(Temp, degree=2 )  + poly(Hab, degree=2 )

I then removed ($Hab_{i,j}$) from the X1_formula only and the model converged without an issue:

    X1_formula =  ~ poly(Temp, degree=2 ) 
    X2_formula =  ~ poly(Temp, degree=2 )  + poly(Hab, degree=2 )

There is no problem when no covariate information is included.

So the problem only shows up for specific combinations of covariate input.

Blevy2 · 2024-03-06T21:52:21Z

Hi @James-Thorson-NOAA ,

I think I am seeing something else related to this issue, which is why I am posting here. I may consider a new issue though as it could be something different. Do you think this problem is related to having a package incompatibility?

In some models with covariate I am getting the error:

I went ahead and printed out On_bounds and parameter_estimates at this step and see the following:

[1] "On_bounds is"
ln_H_input ln_H_input beta1_ft beta1_ft beta1_ft beta1_ft
NA NA NA NA NA NA
beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft
NA NA NA NA NA NA
beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft
NA NA NA NA NA NA
beta1_ft beta1_ft beta1_ft beta1_ft gamma1_cp gamma1_cp
NA NA NA NA NA NA
L_omega1_z L_epsilon1_z logkappa1 beta2_ft beta2_ft beta2_ft
NA NA NA NA NA NA
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft
NA NA NA NA NA NA
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft
NA NA NA NA NA NA
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft gamma2_cp
NA NA NA NA NA NA
gamma2_cp gamma2_cp gamma2_cp L_omega2_z L_epsilon2_z logkappa2
NA NA NA NA NA NA
logSigmaM
NA
[1] "parameter_estimates are"
$par
ln_H_input ln_H_input beta1_ft beta1_ft beta1_ft beta1_ft
NaN NaN NaN NaN NaN NaN
beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft
NaN NaN NaN NaN NaN NaN
beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft
NaN NaN NaN NaN NaN NaN
beta1_ft beta1_ft beta1_ft beta1_ft gamma1_cp gamma1_cp
NaN NaN NaN NaN NaN NaN
L_omega1_z L_epsilon1_z logkappa1 beta2_ft beta2_ft beta2_ft
NaN NaN NaN NaN NaN NaN
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft
NaN NaN NaN NaN NaN NaN
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft
NaN NaN NaN NaN NaN NaN
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft gamma2_cp
NaN NaN NaN NaN NaN NaN
gamma2_cp gamma2_cp gamma2_cp L_omega2_z L_epsilon2_z logkappa2
NaN NaN NaN NaN NaN NaN
logSigmaM
NaN

$objective
[1] NaN

$iterations
[1] 137

$evaluations
function gradient
234 138

$time_for_MLE
Time difference of 41.33874 mins

$max_gradient
[1] NaN

$Convergence_check
[1] NA

$number_of_coefficients
Total Fixed Random
44281 55 44226

$AIC
[1] NaN

$diagnostics
Param starting_value Lower MLE Upper final_gradient
1 ln_H_input 0.0000000 -Inf NaN Inf NaN
2 ln_H_input 0.0000000 -Inf NaN Inf NaN
3 beta1_ft 0.0000000 -Inf NaN Inf NaN
4 beta1_ft 0.0000000 -Inf NaN Inf NaN
5 beta1_ft 0.0000000 -Inf NaN Inf NaN
6 beta1_ft 0.0000000 -Inf NaN Inf NaN
7 beta1_ft 0.0000000 -Inf NaN Inf NaN
8 beta1_ft 0.0000000 -Inf NaN Inf NaN
9 beta1_ft 0.0000000 -Inf NaN Inf NaN
10 beta1_ft 0.0000000 -Inf NaN Inf NaN
11 beta1_ft 0.0000000 -Inf NaN Inf NaN
12 beta1_ft 0.0000000 -Inf NaN Inf NaN
13 beta1_ft 0.0000000 -Inf NaN Inf NaN
14 beta1_ft 0.0000000 -Inf NaN Inf NaN
15 beta1_ft 0.0000000 -Inf NaN Inf NaN
16 beta1_ft 0.0000000 -Inf NaN Inf NaN
17 beta1_ft 0.0000000 -Inf NaN Inf NaN
18 beta1_ft 0.0000000 -Inf NaN Inf NaN
19 beta1_ft 0.0000000 -Inf NaN Inf NaN
20 beta1_ft 0.0000000 -Inf NaN Inf NaN
21 beta1_ft 0.0000000 -Inf NaN Inf NaN
22 beta1_ft 0.0000000 -Inf NaN Inf NaN
23 gamma1_cp 0.0000000 -Inf NaN Inf NaN
24 gamma1_cp 0.0000000 -Inf NaN Inf NaN
25 L_omega1_z 1.0000000 -Inf NaN Inf NaN
26 L_epsilon1_z 1.0000000 -Inf NaN Inf NaN
27 logkappa1 -0.1053605 -Inf NaN Inf NaN
28 beta2_ft 0.0000000 -Inf NaN Inf NaN
29 beta2_ft 0.0000000 -Inf NaN Inf NaN
30 beta2_ft 0.0000000 -Inf NaN Inf NaN
31 beta2_ft 0.0000000 -Inf NaN Inf NaN
32 beta2_ft 0.0000000 -Inf NaN Inf NaN
33 beta2_ft 0.0000000 -Inf NaN Inf NaN
34 beta2_ft 0.0000000 -Inf NaN Inf NaN
35 beta2_ft 0.0000000 -Inf NaN Inf NaN
36 beta2_ft 0.0000000 -Inf NaN Inf NaN
37 beta2_ft 0.0000000 -Inf NaN Inf NaN
38 beta2_ft 0.0000000 -Inf NaN Inf NaN
39 beta2_ft 0.0000000 -Inf NaN Inf NaN
40 beta2_ft 0.0000000 -Inf NaN Inf NaN
41 beta2_ft 0.0000000 -Inf NaN Inf NaN
42 beta2_ft 0.0000000 -Inf NaN Inf NaN
43 beta2_ft 0.0000000 -Inf NaN Inf NaN
44 beta2_ft 0.0000000 -Inf NaN Inf NaN
45 beta2_ft 0.0000000 -Inf NaN Inf NaN
46 beta2_ft 0.0000000 -Inf NaN Inf NaN
47 beta2_ft 0.0000000 -Inf NaN Inf NaN
48 gamma2_cp 0.0000000 -Inf NaN Inf NaN
49 gamma2_cp 0.0000000 -Inf NaN Inf NaN
50 gamma2_cp 0.0000000 -Inf NaN Inf NaN
51 gamma2_cp 0.0000000 -Inf NaN Inf NaN
52 L_omega2_z 1.0000000 -Inf NaN Inf NaN
53 L_epsilon2_z 1.0000000 -Inf NaN Inf NaN
54 logkappa2 -0.1053605 -Inf NaN Inf NaN
55 logSigmaM 1.6094379 -Inf NaN Inf NaN

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NA/NaN gradient evaluation #383

NA/NaN gradient evaluation #383

Blevy2 commented Jan 31, 2024 •

edited

James-Thorson commented Jan 31, 2024 via email

Blevy2 commented Jan 31, 2024

James-Thorson commented Jan 31, 2024 via email

Blevy2 commented Jan 31, 2024

James-Thorson commented Jan 31, 2024 via email

Blevy2 commented Feb 5, 2024

Blevy2 commented Mar 6, 2024 •

edited

NA/NaN gradient evaluation #383

NA/NaN gradient evaluation #383

Comments

Blevy2 commented Jan 31, 2024 • edited

James-Thorson commented Jan 31, 2024 via email

Blevy2 commented Jan 31, 2024

James-Thorson commented Jan 31, 2024 via email

Blevy2 commented Jan 31, 2024

James-Thorson commented Jan 31, 2024 via email

Blevy2 commented Feb 5, 2024

Blevy2 commented Mar 6, 2024 • edited

Blevy2 commented Jan 31, 2024 •

edited

Blevy2 commented Mar 6, 2024 •

edited