Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting priors for K #373

Open
samlbickley opened this issue Oct 1, 2018 · 10 comments
Open

Setting priors for K #373

samlbickley opened this issue Oct 1, 2018 · 10 comments

Comments

@samlbickley
Copy link

I have previously measured reaeration rates that I would like to use as priors, but my problem is I don't know how to set that in the code. I have a k rate of 66/day. In issue #367, Bob Hall said to use the following specs:

K600_daily_meanlog_meanlog = 0
K600_daily_meanlog_sdlog =1

I don't have a background in Bayesian models, so I don't quite understand what these specs mean. In issue #367 , I believe the person had very low k600, so I interpret the specs given by Bob would limit the modeled k600 between 0 and 1. Is that what's going on?

Furthermore, what is the best way to convert k to k600? I assume that my measured reaeration rate isn't the standardized gas exchange rate and must be converted somehow.

tldr:
I have measured reaeration rates that I would like to use as priors for K600 but don't know how to specify that properly in the code.

Thanks!

library(streamMetabolizer)

setwd("C:/Users/slb0035/Google Drive/School/Working/METABOLISM/streamMetabolizer/INPUT/")

data<-read.csv("sb4_spring2018.csv", stringsAsFactors = FALSE)
data$solar.time<- as.POSIXct(data$solar.time, tz='America/New_York')
lubridate::tz(data$solar.time)
data$solar.time <- streamMetabolizer::calc_solar_time(data$solar.time, longitude=-85) #FBMI longitude

calc_light(data$solar.time, 32, -85)
library(unitted)
data$light <- calc_light(u(data$solar.time), u(32, 'degN'), u(-85, 'degE'), u(3004, 'umol m^-2 s^-1'), attach.units=TRUE)

bayes_name <- mm_name(type='bayes', pool_K600="normal", err_obs_iid=TRUE, err_proc_iid=TRUE)
bayes_specs <- specs(bayes_name, K600_daily_meanlog_meanlog=0, K600_daily_meanlog_sdlog=66) 


mm <- metab(bayes_specs, data=data)
output<-as.data.frame(get_params(mm))
write.csv(output, 'sb4_spring2018-output_0-6_kmany.csv') #whatever filename is

pdf('plot_DO_preds_sb4_spring2018output_0-2_kmany.pdf')
plot_DO_preds(mm)
dev.off()

pdf('plot_metab_preds_sb4_spring2018output_0-2_kmany.pdf')
plot_metab_preds(mm)
dev.off()

mcmc <- get_mcmc(mm)
pdf('mcmc_sb4_spring2018output_0-2_kmany.pdf')
rstan::traceplot(mcmc, pars='K600_daily', nrow=3)
dev.off()



Session info ---------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 system   x86_64, mingw32             
 ui       RStudio (1.1.456)           
 language (EN)                        
 collate  English_United States.1252  
 tz       America/Chicago             
 date     2018-10-01                  

Packages -------------------------------------------------------------------------------------------------
 package           * version   date       source        
 assertthat          0.2.0     2017-04-11 CRAN (R 3.5.1)
 base              * 3.5.1     2018-07-02 local         
 bindr               0.1.1     2018-03-13 CRAN (R 3.5.1)
 bindrcpp          * 0.2.2     2018-03-29 CRAN (R 3.5.1)
 bitops              1.0-6     2013-08-17 CRAN (R 3.5.0)
 codetools           0.2-15    2016-10-05 CRAN (R 3.5.1)
 colorspace          1.3-2     2016-12-14 CRAN (R 3.5.1)
 compiler            3.5.1     2018-07-02 local         
 crayon              1.3.4     2017-09-16 CRAN (R 3.5.1)
 curl                3.2       2018-03-28 CRAN (R 3.5.1)
 datasets          * 3.5.1     2018-07-02 local         
 deSolve             1.21      2018-05-09 CRAN (R 3.5.0)
 devtools            1.13.6    2018-06-27 CRAN (R 3.5.1)
 digest              0.6.17    2018-09-12 CRAN (R 3.5.1)
 dplyr               0.7.6     2018-06-29 CRAN (R 3.5.1)
 ggplot2           * 3.0.0     2018-07-03 CRAN (R 3.5.1)
 glue                1.3.0     2018-07-17 CRAN (R 3.5.1)
 graphics          * 3.5.1     2018-07-02 local         
 grDevices         * 3.5.1     2018-07-02 local         
 grid                3.5.1     2018-07-02 local         
 gridExtra           2.3       2017-09-09 CRAN (R 3.5.1)
 gtable              0.2.0     2016-02-26 CRAN (R 3.5.1)
 httr                1.3.1     2017-08-20 CRAN (R 3.5.1)
 inline              0.3.15    2018-05-18 CRAN (R 3.5.1)
 jsonlite            1.5       2017-06-01 CRAN (R 3.5.1)
 labeling            0.3       2014-08-23 CRAN (R 3.5.0)
 LakeMetabolizer     1.5.0     2016-06-23 CRAN (R 3.5.1)
 lazyeval            0.2.1     2017-10-29 CRAN (R 3.5.1)
 lubridate           1.7.4     2018-04-11 CRAN (R 3.5.1)
 magrittr            1.5       2014-11-22 CRAN (R 3.5.1)
 memoise             1.1.0     2017-04-21 CRAN (R 3.5.1)
 methods           * 3.5.1     2018-07-02 local         
 munsell             0.5.0     2018-06-12 CRAN (R 3.5.1)
 parallel            3.5.1     2018-07-02 local         
 pillar              1.3.0     2018-07-14 CRAN (R 3.5.1)
 pkgconfig           2.0.2     2018-08-16 CRAN (R 3.5.1)
 plyr                1.8.4     2016-06-08 CRAN (R 3.5.1)
 purrr               0.2.5     2018-05-29 CRAN (R 3.5.1)
 R6                  2.2.2     2017-06-17 CRAN (R 3.5.1)
 Rcpp                0.12.18   2018-07-23 CRAN (R 3.5.1)
 RCurl               1.95-4.11 2018-07-15 CRAN (R 3.5.1)
 reshape2            1.4.3     2017-12-11 CRAN (R 3.5.1)
 rLakeAnalyzer       1.11.4    2018-03-14 CRAN (R 3.5.1)
 rlang               0.2.2     2018-08-16 CRAN (R 3.5.1)
 rstan             * 2.17.3    2018-01-20 CRAN (R 3.5.1)
 rstudioapi          0.7       2017-09-07 CRAN (R 3.5.1)
 scales              1.0.0     2018-08-09 CRAN (R 3.5.1)
 StanHeaders       * 2.17.2    2018-01-20 CRAN (R 3.5.1)
 stats             * 3.5.1     2018-07-02 local         
 stats4              3.5.1     2018-07-02 local         
 streamMetabolizer * 0.10.9    2018-05-09 local         
 stringi             1.1.7     2018-03-12 CRAN (R 3.5.0)
 stringr             1.3.1     2018-05-10 CRAN (R 3.5.1)
 tibble              1.4.2     2018-01-22 CRAN (R 3.5.1)
 tidyr               0.8.1     2018-05-18 CRAN (R 3.5.1)
 tidyselect          0.2.4     2018-02-26 CRAN (R 3.5.1)
 tools               3.5.1     2018-07-02 local         
 unitted           * 0.2.9     2018-05-09 local         
 utils             * 3.5.1     2018-07-02 local         
 withr               2.1.2     2018-03-15 CRAN (R 3.5.1)
 XML                 3.98-1.16 2018-08-19 CRAN (R 3.5.1)
 yaml                2.2.0     2018-07-25 CRAN (R 3.5.1)

@aappling-usgs
Copy link
Contributor

See this issue + answer for a way to fix K to a single number. Basically, you should pick a distribution for K600_daily_meanlog that is very narrow (tiny meanlog_sdlog) and centered (meanlog_meanlog) on 66/d. Definitions of K600_daily_meanlog_meanlog and K600_daily_meanlog_sdlog are available at ?specs (online at https://rdrr.io/github/USGS-R/streamMetabolizer/man/specs.html)

A useful discussion of the relationships among various forms of k, k2, K, K600, etc. can be found in Raymond et al. 2012 (https://doi.org/10.1215/21573689-1597669).

@samlbickley
Copy link
Author

Thank you, Allison! That cleared things up for me.

What if I don't want to set K to a single number, but still used previously measured values to estimate K? Is that possible or is that exactly what I've done?

@aappling-usgs
Copy link
Contributor

Hi Sam, it's a bit tricker to set K to multiple numbers, but it can be done. If you have distinct Q on each day or are willing to spoof your Q inputs, you can set up a very strong prior with a multi-node K~Q relationship relating your measured K values to the daily mean Q values on those days. This would mean using a Kb model, i.e., pool_K600='binned'. Spoofing Q should be fine because Q, unlike depth, is only used to as a predictor of K if it's used at all.

@samlbickley
Copy link
Author

Thank you, Alison. You've been very helpful.

@samlbickley
Copy link
Author

I think I closed this issue prematurely.

I am ignorant of what a multi-node relationship is. There doesn't appear to be any type of relationship between my K and Q values, however. Without a strong relationship between the two, is it possible to develop this multi-node relationship? Are there any example codes of setting up the K ~ Q relationship?

@samlbickley samlbickley reopened this Oct 4, 2018
@aappling-usgs
Copy link
Contributor

By multi-node relationship I meant a piecewise linear relationship between K and a variable named Q or discharge, which is what's provided by a b_Kb*_... model (pool_K600='binned*' in mm_name()). See https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2017JG004140 figure 5b for some examples.

Rather than passing in actual discharge values, you can pass in arbitrary values. For example, you can set Q to 1 for day 1, 2 for day 2, etc. in your data_daily that you're passing to metab().

You can specify as many values of K600_lnQ_nodes_centers as you want in specs(). Set one value for every value of Q - in the above example, this would be log(1:n) where n is the number of days you're modeling.

You can set a tight prior on the value of K600 at each lnQ_node_center. Set K600_lnQ_nodes_meanlog to a vector of values as long as K600_lnQ_nodes_centers where the values are the logs of your measured/estimated values of K600 on each day.

Set a very small value for K600_lnQ_nodes_sdlog to encourage the model to stick tight to the values of K600_lnQ_nodes_meanlog you set.

You might need to relax K600_lnQ_nodediffs_sdlog (make it a higher value); this could involve some experimentation to see if it matters, or you could just make it really high from the outset.

Use pool_K600='binned_sdzero' if you want to [pretty nearly] fix the daily K600 values at the values you provide.

See ?specs for more information on each of the above specifications/parameters.

I've never done this before (my focus has always been on learning rather than fixing K) and don't have time to work up example code, but if you can get the above instructions to work (I really think they will), please do share your code here so that others can benefit.
Thanks,
Alison

@samlbickley
Copy link
Author

I've attached my code, lnK and daily_Q files, out model outputs.

First, thank you for you help. I was able to get the model to run, but the estimated DO is quite wonky and I'm getting negative GPP estimates.

I've got low-GPP sites, and the model gives negative GPP for nearly all of my streams. I was told by AJ Reisinger that negative GPP on all dates probably indicates ER estimates are not reliable and that there's just not enough GPP to model metabolism. We have measured K values from previous work, so I fixed K600 to the mean of measured K600 (thank you for your help!) and was able to get positive GPP and ER rates that are within the range previously measured. If I've correctly followed your instructions and am still getting negative GPP and basically unmodellable days, should I just stick with my fixed k models, which is giving me reasonable estimates?

library(streamMetabolizer)

setwd("C:/Users/sam/Google Drive/School/Working/METABOLISM/streamMetabolizer/INPUT/")

data<-read.csv("sb4_spring2018.csv", stringsAsFactors = FALSE)
data$solar.time<- as.POSIXct(data$solar.time, tz='America/New_York')
lubridate::tz(data$solar.time)
data$solar.time <- streamMetabolizer::calc_solar_time(data$solar.time, longitude=-85) #FBMI longitude

calc_light(data$solar.time, 32, -85)
library(unitted)
data$light <- calc_light(u(data$solar.time), u(32, 'degN'), u(-85, 'degE'), u(3004, 'umol m^-2 s^-1'), attach.units=TRUE)

Q_daily<-read.csv("Q_daily.csv", stringsAsFactors = FALSE)
#Rather than passing in actual discharge values, you can pass in arbitrary values. 
#For example, you can set Q to 1 for day 1, 2 for day 2, etc. in your data_daily that you're passing to metab().
Q_daily$date <- as.Date(Q_daily$date,format="%Y-%m-%d")


lnk_input<-read.csv("sb4_lnK.csv", stringsAsFactors = FALSE)
#You can set a tight prior on the value of K600 at each lnQ_node_center. 
#Set K600_lnQ_nodes_meanlog to a vector of values as long as K600_lnQ_nodes_centers 
#where the values are the logs of your measured/estimated values of K600 on each day.
lnk<-lnk_input[,'HEADER']
#make dataframe a vector

bayes_name <- mm_name(type='bayes', pool_K600="binned", err_obs_iid=TRUE, err_proc_iid=TRUE)
bayes_specs <- specs(bayes_name, K600_lnQ_nodes_centers = log(1:10), K600_lnQ_nodes_meanlog = lnK, K600_lnQ_nodes_sdlog = 0.00001, K600_lnQ_nodediffs_sdlog = 1000) 


mm <- metab(bayes_specs, data=data, data_daily=Q_daily)

Priors and outputs

@aappling-usgs
Copy link
Contributor

Thanks for sharing this code, Sam.

I'm only seeing 9 values in sb4_lnK.csv, 8 of which get read by read.csv as non-header rows, but K600_lnQ_nodes_centers has length 10...is it possible that file is out of date relative to your code, or vice versa?

The K600.daily predictions don't look like a great match to exp(lnK), though maybe it's because I'm not looking at the right lnK values...is that what you found, too? That said, the match seems close enough that I'm surprised that you're getting negative GPP with this model but positive GPP (significantly above 0?) with a Bayesian model. Hmm.

If by "my fixed K models" you mean 2-parameter MLE models, I think it's plausible that those would work better for you. A major advantage of the Bayesian models is usually their ability to help you estimate K, so if you already have those numbers, the advantages relative to MLE models drop off substantially. (I do still expect some advantage because the Bayesian models can be state-space, including both process error and observation error, but K may well be the bigger issue for your data.) I would just caution against basing that decision entirely on whether your GPP values are coming out as you hope or not...that doesn't come across as especially sound science, and as AJ said, you need some signal (GPP relative to K) to pull truth out of a timeseries. If you have very high confidence in your outside estimates of K600, that's a better reason to use a fixed-K model than that the GPP is coming out positive.

@aappling-usgs
Copy link
Contributor

Also, are you sure K is as high as you say, and in the right units? Those DO predictions are abysmal, and I'm surprised the model couldn't at least find a better fit to the data even with high uncertainty in estimated GPP and ER values...

Also, you may need to run these Bayesian models for a longer warmup period; it looks from your traceplots like they don't all converge until later in the saved-steps period.

@samlbickley
Copy link
Author

  1. There are 9 measured K values and I'm attempting to model 10 days of metabolism, so I assume this is the reason for the discrepancy. I have adjusted the number of modeled days to reflect the number of measured K values.

  2. Could you explain the importance of exp(lnK)? I don't understand your comment.

  3. My GPP estimates are not significantly above zero, The average GPP when I fixed K600 to a mean of measure K600 was 0.120825. I agree with your statement about sound science and not basing my decision on positive GPP alone, especially with GPP values so low.

  4. My "fixed k model" is a Bayesian model. Should I run the "fixed" model as an MLE model? The "fixed k" model I ran is the following code:


library(streamMetabolizer)

setwd("C:/Users/slb0035/Google Drive/School/Working/METABOLISM/streamMetabolizer/INPUT/")

data<-read.csv("sb4_spring2018.csv", stringsAsFactors = FALSE)
data$solar.time<- as.POSIXct(data$solar.time, tz='America/New_York')
lubridate::tz(data$solar.time)
data$solar.time <- streamMetabolizer::calc_solar_time(data$solar.time, longitude=-85) #FBMI longitude

calc_light(data$solar.time, 32, -85)
library(unitted)
data$light <- calc_light(u(data$solar.time), u(32, 'degN'), u(-85, 'degE'), u(3004, 'umol m^-2 s^-1'), attach.units=TRUE)

bayes_name <- mm_name(type='bayes', pool_K600="normal", err_obs_iid=TRUE, err_proc_iid=TRUE)
bayes_specs <- specs(bayes_name, K600_daily_meanlog_meanlog=log(3.477262235), K600_daily_meanlog_sdlog=0.00001) 
#the above sets k600 to fixed value and sdlog keeps variation small

mm <- metab(bayes_specs, data=data)

  1. I'm confident in my measured rates of reaeration rates, but I haven't had anyone check my calculations yet . Here are my K to K600 calculations, based on Raymond et al. 2012 and the formulas referenced therein.

  2. I doubled burn-in and saved-steps and got better MCMC convergence, but the DO predictions are just as terrible.

How you would you recommend going forward with such low-productivity streams? With GPP so close to zero (both positive and negative), can I say anything about metabolism in these systems with any confidence?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants