Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA/NaN gradient evaluation #383

Open
Blevy2 opened this issue Jan 31, 2024 · 7 comments
Open

NA/NaN gradient evaluation #383

Blevy2 opened this issue Jan 31, 2024 · 7 comments

Comments

@Blevy2
Copy link

Blevy2 commented Jan 31, 2024

Hi Jim!

I am running into an issue that I am having trouble understanding that I wanted to run by you.

I am running VAST models on samples taken from spatial population simulation output for fish species that I have developed. In the spatial population models, the probability a fish moves to discrete cell $(i,j)$ in week $w$ is given by probability $Move_{w,i,j}$, where $Move_{w,i,j}$ depends on factors such as the water temperature in the given cell, $Temp_{w,i,j}$.

I want to compare the performance of VAST models with and without covariates, so I could reasonably provide VAST with either $Move_{w,i,j}$ and/or $Temp_{w,i,j}$ as the covariate. Since $Temp_{w,i,j}$ is just one component of the actual movement probability $Move_{w,i,j}$, my assumption is that $Move_{w,i,j}$ would provide more information to VAST about species spatial preferences and thus provide a more accurate estimate compared to $Temp_{w,i,j}$. I am using a second degree polynomial response when including covariates. For example, if using $Move$ as the covariate I input

        X2_formula =  ~ poly(Move, degree=2 ) 
        X1_formula =  ~ poly(Move, degree=2 ) 

My models without covariates are all converging fine and my models that use $Temp_{w,i,j}$ as the covariate are also converging, but models that use $Move_{w,i,j}$ as the covariate all seem to run nearly to completion before they produce the error

        <simpleError in nlminb(start = startpar, objective = fn, gradient = gr, control = nlminb.control,     lower = lower, upper = upper): NA/NaN gradient evaluation>

There are a few github issues for VAST that involve NA/NaN function evaluation, but not NA/NaN gradient evaluation. The closest thing I could find related to this is this issue thread in glmmTMB: glmmTMB/glmmTMB#164

Based on the discussion in the above issue link our best guess is that maybe the gradient function created in VAST has a term something like log(exp(X)) where X is some parameter. During the final stage of a VAST run, possibly when the final Hessian is being calculated, the X value gets really really small (e.g., 1E-320) and the exp(X) goes to zero due to computer rounding and thus the log(exp(X)) is undefined. Does that sound reasonable?

Are you familiar with this error? Do you know how to fix this so we can use $Move$ as a covariate?

Here is my Seesion.Info():

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dplyr_1.1.3 VAST_3.10.1 FishStatsUtils_2.12.1
[4] marginaleffects_0.15.1 units_0.8-4 TMB_1.9.6

loaded via a namespace (and not attached):
[1] utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0 Matrix_1.6-1.1
[5] lattice_0.20-44 magrittr_2.0.3 INLA_23.05.30-1 splines_4.3.2
[9] glue_1.6.2 tibble_3.2.1 pkgconfig_2.0.3 generics_0.1.2
[13] lifecycle_1.0.4 cli_3.6.1 fansi_1.0.5 vctrs_0.6.4
[17] grid_4.3.2 data.table_1.14.4 compiler_4.3.2 sp_1.5-0
[21] pillar_1.9.0 Rcpp_1.0.11 rlang_1.1.2
[1] "input is"

Thanks for your help!

Ben

@James-Thorson
Copy link
Collaborator

James-Thorson commented Jan 31, 2024 via email

@Blevy2
Copy link
Author

Blevy2 commented Jan 31, 2024

Hi Jim,

I just ran the code in the install TMB link and confirmed that TMB is working.

Do you think I need a different version of the Matrix package?

My sessionInfo shows Matrix_1.6-1.1 and TMB_1.9.6

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dplyr_1.1.3 VAST_3.10.1 FishStatsUtils_2.12.1
[4] marginaleffects_0.15.1 units_0.8-4 TMB_1.9.6

loaded via a namespace (and not attached):
[1] utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0 Matrix_1.6-1.1
[5] lattice_0.20-44 magrittr_2.0.3 INLA_23.05.30-1 splines_4.3.2
[9] glue_1.6.2 tibble_3.2.1 pkgconfig_2.0.3 generics_0.1.2
[13] lifecycle_1.0.4 cli_3.6.1 fansi_1.0.5 vctrs_0.6.4
[17] grid_4.3.2 data.table_1.14.4 compiler_4.3.2 sp_1.5-0
[21] pillar_1.9.0 Rcpp_1.0.11 rlang_1.1.2
[1] "input is"

Thanks for your help!

Ben

@James-Thorson
Copy link
Collaborator

James-Thorson commented Jan 31, 2024 via email

@Blevy2
Copy link
Author

Blevy2 commented Jan 31, 2024

Hi Jim,

To clarify, this is being run in a High Performance Computing environment (HPC), which means I am running command line inputs that call R files and then looking at .out files.

I don't think there is a warning when library(TMB) is called, but I just searched the outfile for "warning" and found the following:

In addition: Warning messages:
1: In checkMatrixPackageVersion() :
Package version inconsistency detected.
TMB was built with Matrix version 1.5.4.1
Current Matrix version is 1.6.1.1
Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
2: In dir.create(file.path(paste0(getwd(), "/sim_", sim_num, "/", CN, :
'/mnt/research/b.levy/VAST_Stuff/VAST_MixFishSim/10xKnots_MFS/sim_1/YTF/ConTemp/WCov_MoveCov/FALL/AllStrata' already exists
3: In file(file, "rt") :
cannot open file 'Index_wYearSeason.csv': No such file or directory

Do you agree that this implies I should have Matrix 1.5.4.1 instead of 1.6.1.1? I just want to be clear because this is a High Performance Computing environment so I need to communicate with the HPC administrator to coordinate package installations/changes.

If the Matrix package is off, any idea why I have never had a problem running models on the HPC previously? I have run thousands of models over the last few months on the HPC and only this one covariate type is causing an issue. I just want to try to understand the problem.

thanks Jim!

Ben

@James-Thorson
Copy link
Collaborator

James-Thorson commented Jan 31, 2024 via email

@Blevy2
Copy link
Author

Blevy2 commented Feb 5, 2024

To follow up on this issue, I worked with the HPC administrator to try two different combinations of Matrix and TMB that were shown in some of the issues related to this problem. Unfortunately they both produced the same error.

Next I then tried a different covariate combination in the model and the model also did not converge. In this case I was using both a static ($Hab_{i,j}$) and dynamic covariate ($Temp_{wk,i,j}$) with the following response:

    X1_formula =  ~ poly(Temp, degree=2 ) + poly(Hab, degree=2 ) 
    X2_formula =  ~ poly(Temp, degree=2 )  + poly(Hab, degree=2 ) 

I then removed ($Hab_{i,j}$) from the X1_formula only and the model converged without an issue:

    X1_formula =  ~ poly(Temp, degree=2 ) 
    X2_formula =  ~ poly(Temp, degree=2 )  + poly(Hab, degree=2 ) 

There is no problem when no covariate information is included.

So the problem only shows up for specific combinations of covariate input.

@Blevy2
Copy link
Author

Blevy2 commented Mar 6, 2024

Hi @James-Thorson-NOAA ,

I think I am seeing something else related to this issue, which is why I am posting here. I may consider a new issue though as it could be something different. Do you think this problem is related to having a package incompatibility?

In some models with covariate I am getting the error:

<simpleError in if (any(On_bounds)) { problem_found = TRUE if (quiet == FALSE) { stop(paste0("\nCheck bounds for the following parameters: ", parameter_estimates$diagnostics[which(On_bounds), ])) }}: missing value where TRUE/FALSE needed>

I went ahead and printed out On_bounds and parameter_estimates at this step and see the following:

[1] "On_bounds is"
ln_H_input ln_H_input beta1_ft beta1_ft beta1_ft beta1_ft
NA NA NA NA NA NA
beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft
NA NA NA NA NA NA
beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft
NA NA NA NA NA NA
beta1_ft beta1_ft beta1_ft beta1_ft gamma1_cp gamma1_cp
NA NA NA NA NA NA
L_omega1_z L_epsilon1_z logkappa1 beta2_ft beta2_ft beta2_ft
NA NA NA NA NA NA
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft
NA NA NA NA NA NA
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft
NA NA NA NA NA NA
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft gamma2_cp
NA NA NA NA NA NA
gamma2_cp gamma2_cp gamma2_cp L_omega2_z L_epsilon2_z logkappa2
NA NA NA NA NA NA
logSigmaM
NA
[1] "parameter_estimates are"
$par
ln_H_input ln_H_input beta1_ft beta1_ft beta1_ft beta1_ft
NaN NaN NaN NaN NaN NaN
beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft
NaN NaN NaN NaN NaN NaN
beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft
NaN NaN NaN NaN NaN NaN
beta1_ft beta1_ft beta1_ft beta1_ft gamma1_cp gamma1_cp
NaN NaN NaN NaN NaN NaN
L_omega1_z L_epsilon1_z logkappa1 beta2_ft beta2_ft beta2_ft
NaN NaN NaN NaN NaN NaN
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft
NaN NaN NaN NaN NaN NaN
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft
NaN NaN NaN NaN NaN NaN
beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft gamma2_cp
NaN NaN NaN NaN NaN NaN
gamma2_cp gamma2_cp gamma2_cp L_omega2_z L_epsilon2_z logkappa2
NaN NaN NaN NaN NaN NaN
logSigmaM
NaN

$objective
[1] NaN

$iterations
[1] 137

$evaluations
function gradient
234 138

$time_for_MLE
Time difference of 41.33874 mins

$max_gradient
[1] NaN

$Convergence_check
[1] NA

$number_of_coefficients
Total Fixed Random
44281 55 44226

$AIC
[1] NaN

$diagnostics
Param starting_value Lower MLE Upper final_gradient
1 ln_H_input 0.0000000 -Inf NaN Inf NaN
2 ln_H_input 0.0000000 -Inf NaN Inf NaN
3 beta1_ft 0.0000000 -Inf NaN Inf NaN
4 beta1_ft 0.0000000 -Inf NaN Inf NaN
5 beta1_ft 0.0000000 -Inf NaN Inf NaN
6 beta1_ft 0.0000000 -Inf NaN Inf NaN
7 beta1_ft 0.0000000 -Inf NaN Inf NaN
8 beta1_ft 0.0000000 -Inf NaN Inf NaN
9 beta1_ft 0.0000000 -Inf NaN Inf NaN
10 beta1_ft 0.0000000 -Inf NaN Inf NaN
11 beta1_ft 0.0000000 -Inf NaN Inf NaN
12 beta1_ft 0.0000000 -Inf NaN Inf NaN
13 beta1_ft 0.0000000 -Inf NaN Inf NaN
14 beta1_ft 0.0000000 -Inf NaN Inf NaN
15 beta1_ft 0.0000000 -Inf NaN Inf NaN
16 beta1_ft 0.0000000 -Inf NaN Inf NaN
17 beta1_ft 0.0000000 -Inf NaN Inf NaN
18 beta1_ft 0.0000000 -Inf NaN Inf NaN
19 beta1_ft 0.0000000 -Inf NaN Inf NaN
20 beta1_ft 0.0000000 -Inf NaN Inf NaN
21 beta1_ft 0.0000000 -Inf NaN Inf NaN
22 beta1_ft 0.0000000 -Inf NaN Inf NaN
23 gamma1_cp 0.0000000 -Inf NaN Inf NaN
24 gamma1_cp 0.0000000 -Inf NaN Inf NaN
25 L_omega1_z 1.0000000 -Inf NaN Inf NaN
26 L_epsilon1_z 1.0000000 -Inf NaN Inf NaN
27 logkappa1 -0.1053605 -Inf NaN Inf NaN
28 beta2_ft 0.0000000 -Inf NaN Inf NaN
29 beta2_ft 0.0000000 -Inf NaN Inf NaN
30 beta2_ft 0.0000000 -Inf NaN Inf NaN
31 beta2_ft 0.0000000 -Inf NaN Inf NaN
32 beta2_ft 0.0000000 -Inf NaN Inf NaN
33 beta2_ft 0.0000000 -Inf NaN Inf NaN
34 beta2_ft 0.0000000 -Inf NaN Inf NaN
35 beta2_ft 0.0000000 -Inf NaN Inf NaN
36 beta2_ft 0.0000000 -Inf NaN Inf NaN
37 beta2_ft 0.0000000 -Inf NaN Inf NaN
38 beta2_ft 0.0000000 -Inf NaN Inf NaN
39 beta2_ft 0.0000000 -Inf NaN Inf NaN
40 beta2_ft 0.0000000 -Inf NaN Inf NaN
41 beta2_ft 0.0000000 -Inf NaN Inf NaN
42 beta2_ft 0.0000000 -Inf NaN Inf NaN
43 beta2_ft 0.0000000 -Inf NaN Inf NaN
44 beta2_ft 0.0000000 -Inf NaN Inf NaN
45 beta2_ft 0.0000000 -Inf NaN Inf NaN
46 beta2_ft 0.0000000 -Inf NaN Inf NaN
47 beta2_ft 0.0000000 -Inf NaN Inf NaN
48 gamma2_cp 0.0000000 -Inf NaN Inf NaN
49 gamma2_cp 0.0000000 -Inf NaN Inf NaN
50 gamma2_cp 0.0000000 -Inf NaN Inf NaN
51 gamma2_cp 0.0000000 -Inf NaN Inf NaN
52 L_omega2_z 1.0000000 -Inf NaN Inf NaN
53 L_epsilon2_z 1.0000000 -Inf NaN Inf NaN
54 logkappa2 -0.1053605 -Inf NaN Inf NaN
55 logSigmaM 1.6094379 -Inf NaN Inf NaN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants