Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about model error structure #354

Open
tsebens opened this issue Dec 13, 2022 · 1 comment
Open

Question about model error structure #354

tsebens opened this issue Dec 13, 2022 · 1 comment

Comments

@tsebens
Copy link

tsebens commented Dec 13, 2022

Less of an issue and more of a clarification: When I use e to specify subsets of the data which should be fit to using multiple separate error distributions (which are then specified in ObsModel) how does VAST handle this? Does it split the dataset and fit multiple instances of the model, or does it fit a single model but with a likelihood function which can adopt different outputs depending on the data subset?

@agruss2
Copy link
Collaborator

agruss2 commented Dec 14, 2022

Hello,

When you fit a VAST model to multiple data types and, therefore, need to specify multiple distribution models in VAST, you need to:
(1) Include a “Data_type” column in your dataset.
(2) Modify the “ObsModel” object in VAST so that it includes several rows instead of just one.
(3) Include a data type catchability factor in the first linear predictor of your VAST model (preferably specified as a fixed effect):

catchability_data = my_dataset[,'Data_type',drop = FALSE]
Q1_formula = ~ factor( Data_type )

Let us see how things work in practice:
(1) If you work only with biomass-sampling data (or any data type that can take any non-negative real number), then you will not need any “Data_type” column in your dataset; and you will set ObsModel to c( 2, 1 )
(2) If you work with biomass-sampling data and count data (or any data type that can take any positive integer), then you will need to: include a “Data_type” column in your dataset, with levels “Count” and “Biomass”, in this order; and set ObsModel to cbind( c( 14, 2 ), 1 )
(3) If you work with biomass-sampling data and encounter/non-encounter data, then you will need to: include a “Data_type” column in your dataset, with levels “Encounter” and “Biomass”, in this order; and set ObsModel to cbind( c( 13, 2 ), 1 )
(4) If you work with count data and encounter/non-encounter data, then you will need to: include a “Data_type” column in your dataset, with levels “Encounter” and “Count”, in this order; and set ObsModel to cbind( c( 13, 14 ), 1 )
(5) If you work with biomass-sampling data, count data and encounter/non-encounter data, then you will need to: include a “Data_type” column in your dataset, with levels “Encounter”, “Count” and “Biomass”, in this order; and set ObsModel to cbind( c( 13, 14, 2 ), 1 )

When your VAST model is fitted to multiple data types, the likelihoods for the different data types (e.g., encounters/non-encounters, counts and biomass-sampling data) have parameters in common since you are using a Poisson-link delta model (as you specified ObsModel[,2] as being equal to 1). Consequently, only one single VAST model is fitted to all the data and the likelihood of your VAST model fitted to multiple data types is obtained as the product of the likelihoods for the different individual data types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants