Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forecasting/estimating missing years with Seasonality #337

Open
BrittanyTroast-NOAA opened this issue Aug 31, 2022 · 1 comment
Open

Forecasting/estimating missing years with Seasonality #337

BrittanyTroast-NOAA opened this issue Aug 31, 2022 · 1 comment

Comments

@BrittanyTroast-NOAA
Copy link

I am running a seasonal model and forecasting a few years past the historical data. The model keeps failing (zero at starting parameters). I believe I have isolated the problem down to when I'm adding the forecasted years (or mimicking it by making one of the historical years Catch=NA to make the model estimate the missing year). Is there something I am missing that I need to change in the seasonal setup to allow the model to estimate missing years or forecast into future years? Is it something to do with the PredTF_i variable while fitting the model? I’d appreciate any help.

Below is the seasonal model from the wiki where I have made the catch for 2013 (all seasons) NA to mimic missing data for that year.

Additionally, while trying some other methods where I exclude the empty forecasting years in the main dataset, but leave a spot for them while making the Dummy Data section I receive the same error. So it seems it has something to do with that part of the code.

library(VAST)
example = load_example( data_set="NWA_yellowtail_seasons" )

##Made the weight NA to force the model to estimate in year 2013
example$sampling_data$weight[example$sampling_data$year==2013]<-NA

##Load data and quick exploration of structure
##Set of years and seasons. The DFO spring survey usually occurs before the NOAA NEFSC spring survey, so ordering accordingly.
year_set = sort(unique(example$sampling_data[,'year']))
season_set = c("DFO", "SPRING", "FALL")

##Create a grid with all unique combinations of seasons and years and then combine these into one "year_season" variable
yearseason_grid = expand.grid("season" = season_set, "year" = year_set)
yearseason_levels = apply(yearseason_grid[,2:1], MARGIN = 1, FUN = paste, collapse = "_")
yearseason_labels = round(yearseason_grid[,'year'] + (as.numeric(factor(yearseason_grid[,'season'], levels = season_set))-1)/length(season_set), digits=1)

##Similar process, but for the observations
yearseason_i = apply(example$sampling_data[,c("year","season")], MARGIN = 1, FUN = paste, collapse = "_")
yearseason_i = factor(yearseason_i, levels = yearseason_levels)

##Add the year_season factor column to our sampling_data data set
example$sampling_data$year_season = yearseason_i
example$sampling_data$season = factor(example$sampling_data$season, levels = season_set)

##Some last processing steps
example$sampling_data = example$sampling_data[, c("year", "season", "year_season", "latitude", "longitude", "swept", "weight")]

##Make dummy observation for each season-year combination
dummy_data = data.frame(
year = yearseason_grid[,'year'],
season = yearseason_grid[,'season'],
year_season = yearseason_levels,
latitude = mean(example$sampling_data[,'latitude']),
longitude = mean(example$sampling_data[,'longitude']),
swept = mean(example$sampling_data[,'swept']),
weight = 0,
dummy = TRUE)

##Combine with sampling data
full_data = rbind(cbind(example$sampling_data, dummy = FALSE), dummy_data)

##Create sample data
samp_dat = data.frame(
"year_season" = as.numeric(full_data$year_season)-1,
"Lat" = full_data$latitude,
"Lon" = full_data$longitude,
"weight" = full_data$weight,
"Swept" = full_data$swept,
"Dummy" = full_data$dummy )

##Covariate data. Note here, case sensitive!
cov_dat = data.frame(
"Year" = as.numeric(full_data$year_season)-1,
"Year_Cov" = factor(full_data$year, levels = year_set),
"Season" = full_data$season,
"Lat" = full_data$latitude,
"Lon" = full_data$longitude )

##Make settings
settings = make_settings(n_x = 100,
Region = example$Region,
strata.limits = example$strata.limits,
purpose = "index2",
FieldConfig = c("Omega1" = 1, "Epsilon1" = 1, "Omega2" = 1, "Epsilon2" = 1),
RhoConfig = c("Beta1" = 3, "Beta2" = 3, "Epsilon1" = 4, "Epsilon2" = 4),
ObsModel = c(1, 1),
bias.correct = FALSE,
Options = c('treat_nonencounter_as_zero' = FALSE) )

##Creating model formula
formula_use = ~ Season + Year_Cov

##Implement corner constraint for linear effect but not spatially varying effect:
##* one level for each term is 2 (just spatially varying)
##* all other levels for each term is 3 (spatialy varying plus linear effect)
X1config_cp_use = matrix( c(2, rep(3,nlevels(cov_dat$Season)-1), 2, rep(3,nlevels(cov_dat$Year_Cov)-1) ), nrow=1 )
X2config_cp_use = matrix( c(2, rep(3,nlevels(cov_dat$Season)-1), 2, rep(3,nlevels(cov_dat$Year_Cov)-1) ), nrow=1 )

fit_orig = fit_model("settings" = settings,
"Lat_i" = samp_dat[, 'Lat'],
"Lon_i" = samp_dat[, 'Lon'],
"t_i" = samp_dat[, 'year_season'],
"b_i" = samp_dat[, 'weight'],
"a_i" = samp_dat[, 'Swept'],
"X1config_cp" = X1config_cp_use,
"X2config_cp" = X2config_cp_use,
"covariate_data" = cov_dat,
"X1_formula" = formula_use,
"X2_formula" = formula_use,
"X_contrasts" = list(Season = contrasts(cov_dat$Season, contrasts = FALSE), Year_Cov = contrasts(cov_dat$Year_Cov, contrasts = FALSE)),
"run_model" = FALSE,
"PredTF_i" = samp_dat[, 'Dummy'] )

##Adjust mapping for log_sigmaXi and fitting final model -- pool variance for all seasons and then set year's to NA
Map_adjust = fit_orig$tmb_list$Map

##Pool variances for each term to a single value
Map_adjust$log_sigmaXi1_cp = factor(c(rep(as.numeric(Map_adjust$log_sigmaXi1_cp[1]), nlevels(cov_dat$Season)),
rep(as.numeric(Map_adjust$log_sigmaXi1_cp[nlevels(cov_dat$Season)+1]), nlevels(cov_dat$Year_Cov))))
Map_adjust$log_sigmaXi2_cp = factor(c(rep(as.numeric(Map_adjust$log_sigmaXi2_cp[1]), nlevels(cov_dat$Season)),
rep(as.numeric(Map_adjust$log_sigmaXi2_cp[nlevels(cov_dat$Season)+1]), nlevels(cov_dat$Year_Cov))))

##Fit final model with new mapping
fit = fit_model("settings" = settings,
"Lat_i" = samp_dat[, 'Lat'],
"Lon_i" = samp_dat[, 'Lon'],
"t_i" = samp_dat[, 'year_season'],
"b_i" = samp_dat[, 'weight'],
"a_i" = samp_dat[, 'Swept'],
"X1config_cp" = X1config_cp_use,
"X2config_cp" = X2config_cp_use,
"covariate_data" = cov_dat,
"X1_formula" = formula_use,
"X2_formula" = formula_use,
"X_contrasts" = list(Season = contrasts(cov_dat$Season, contrasts = FALSE), Year_Cov = contrasts(cov_dat$Year_Cov, contrasts = FALSE)),
"newtonsteps" = 1,
"PredTF_i" = samp_dat[, 'Dummy'],
"Map" = Map_adjust )

@aallyn
Copy link
Contributor

aallyn commented Sep 1, 2022

Hey Brittany -- hopefully some others chime in and double check this. I think what is happening here is that the model is going to try to estimate every year in the covariate dataframe (except the first one) as a spatially-varying + linear effect rep(3,nlevels(cov_dat$Year_Cov)-1) ). So, in the example above, there is going to be a gamma1/2 parameter for the 2013 year, but there is no sample data to estimate it. Consequently, you get the error about the starting gradient being 0. I think you can confirm this if you set run_model = FALSE in your last call to fit_model and then do the following:

ParHat = TMBhelper:::extract_fixed(fit$tmb_list$Obj)
Gr = fit$tmb_list$Obj$gr( ParHat )
which(Gr == 0)

If that does end up being the case, I am not sure the best way forward and it might depend on the application. When we bumped up against a similar issue, Jim mentioned we could consider removing the year term and just letting the variability get soaked up by the season and spatio-temporal year-season random effects. Beyond that, I think other potential options would need to be implemented after fitting the model -- in other words not using the PredTF_i flag -- and would involve coming up with some way to provide year effect values for the forecast years based on those that were estimated during the model fitting. Hope that helps some!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants