Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems estimating GPP and ER in artifical, non-mixing containers #378

Open
mluerig opened this issue Jan 28, 2019 · 3 comments
Open

Problems estimating GPP and ER in artifical, non-mixing containers #378

mluerig opened this issue Jan 28, 2019 · 3 comments

Comments

@mluerig
Copy link

mluerig commented Jan 28, 2019

Dear Allison and Bob,

last year I asked you for some help regarding estimating metabolism in replicated cattle tank [1000l] ecosystems (#367). With your help I was able to resolve most of the problems I had with these short timeseries.

However, I have just started to work on data from bigger ponds [15000l], where the timeseries are considerably longer (150-200 days), but with similar data structure (15 min resolution, measurements of O2 concentration and saturation, temperature and light).

As last time, I have smoothed my data (temp, O2, and light with a loess smoother, 1 day window size), but I am running similar issues as last time: positive ER estimates and "chopped off peaks" in the DO predictions (see plots below). Looking at the DO predictions, the results are not terrible, but I really don't get the positive ER estimates. So, there are some specific questions I have:

  • how much does the time window size (=number of consecutive days) affect the estimates?
  • and related, over a longer timestretch, day_start and day_end I specify will differ for a lot of my days. why is this parameter so important (has a big effect on the model estimates), and, is there a way to specify it dynamically (or chop up the timeseries)?
  • regarding positive ER estimates you mentioned last time that :

If the nighttime oversaturation is real, you may simply need to interpret the ER values as being ER minus the rate of O2 pushing its way into the water from bubbles.

do you think there is a way to calculate this gas transfer from my data?

  • can I still interpret my GPP estimates meaningfully even if ER is positive / not meaningful, i.e. just use GPP and ignore ER? I was thinking of this because of Bob's comment:

ER >> GPP is weird unless you are adding external C or the tanks have high ER from productivity that occurred before you measure metabolism. But ER is difficult to measure.

Below is my model output, and the plotted metabolism estimates. I realize that this is a pretty specific problem, and not a bug or generalizeable issue and you probably cannot spend a lot of time on it. Any thoughts are highly appreciated - thanks for your help!

mod.data.1.txt

do_preds1
metab_preds

metab_model of type metab_bayes 
streamMetabolizer version 0.10.9 
Specifications:
  model_name                 b_Kn_oipi_tr_plrckm.stan                                                                                               
  engine                     stan                                                                                                                   
  split_dates                FALSE                                                                                                                  
  keep_mcmcs                 TRUE                                                                                                                   
  keep_mcmc_data             TRUE                                                                                                                   
  day_start                  4                                                                                                                      
  day_end                    28                                                                                                                     
  day_tests                  full_day, even_timesteps, complete_data, pos_discharge                                                                 
  required_timestep          NA                                                                                                                     
  GPP_daily_mu               3.1                                                                                                                    
  GPP_daily_lower            -Inf                                                                                                                   
  GPP_daily_sigma            6                                                                                                                      
  ER_daily_mu                -7.1                                                                                                                   
  ER_daily_upper             Inf                                                                                                                    
  ER_daily_sigma             7.1                                                                                                                    
  K600_daily_meanlog_meanlog 0                                                                                                                      
  K600_daily_meanlog_sdlog   1                                                                                                                      
  K600_daily_sdlog_sigma     0.05                                                                                                                   
  err_obs_iid_sigma_scale    0.03                                                                                                                   
  err_proc_iid_sigma_scale   5                                                                                                                      
  params_in                  GPP_daily_mu, GPP_daily_lower, GPP_daily_sigma, ER_daily_mu, ER_daily_upper, ER_daily_sigma, K600_daily_meanlog_mean...
  params_out                 GPP, ER, GPP_daily, ER_daily, K600_daily, K600_daily_predlog, K600_daily_sdlog, err_obs_iid_sigma, err_obs_iid, err_...
  n_chains                   4                                                                                                                      
  n_cores                    8                                                                                                                      
  burnin_steps               500                                                                                                                    
  saved_steps                500                                                                                                                    
  thin_steps                 1                                                                                                                      
  verbose                    FALSE                                                                                                                  
  model_path                 C:/R/library/streamMetabolizer/models/b_Kn_oipi_tr_plrckm.stan                                                         
Fitting time: 198.8 secs elapsed
Parameters (23 dates):
         date     GPP.daily GPP.daily.lower GPP.daily.upper        ER.daily  ER.daily.lower ER.daily.upper    K600.daily K600.daily.lower
1  2016-06-25 1.9235106940     1.5546260028    2.1068155693 -1.78593622468  -2.837732608558  -1.0475971979 0.5172439444      0.2322636465
2  2016-06-26 2.0194641199     1.8447006955    2.1919723116  0.44287583521  -0.076361921056   1.0134568470 0.7380909758      0.5514256920
3  2016-06-27 2.4101872500     2.1472031022    2.5957082538  0.05336073694  -0.607526741711   0.6696580081 0.5065047291      0.3700766107
4  2016-06-28 2.6924826183     2.4266813080    2.9042982917  1.57862336017   0.801652646936   2.5567539834 0.8355450635      0.6919972033
5  2016-06-29 3.6138749359     3.4647739204    3.8439259025 -0.85835427863  -1.549463219613  -0.2913297211 0.4488613663      0.3298092030
6  2016-06-30 3.4723441818     3.2452894163    3.6482994156  4.07405399552   3.166775146038   6.3090806256 1.5998213051      1.3896263063
7  2016-07-01 2.6771336529     2.4864780740    2.8529340979  3.83484149751   3.145690219061   4.9406817883 1.0173423554      0.8997480120
8  2016-07-02 1.4480214638     1.2474434705    1.6642115617 -1.24495195958  -1.834716672553  -0.3954006420 0.3561904395      0.2347237976
9  2016-07-03 1.9599520363     1.8022373147    2.1294993926  0.62327563856   0.001476605237   1.1489087127 0.3082071348      0.1816466209
10 2016-07-04 4.7817022873     4.6294262743    4.9473200151  0.08300143154  -0.700073669186   0.8252816594 0.5873964411      0.4657374708
   K600.daily.upper msgs.fit
1      0.7162728375  w      
2      0.9517103198  w      
3      0.6543058953  w      
4      1.0289028763  w      
5      0.5682756388  w      
6      2.0399464124  w      
7      1.2513357040  w      
8      0.5309216874  w      
9      0.4426556606  w      
10     0.6887456752  w      
  ...
Fitting warnings:
  There were 31 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
  There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low
  Examine the pairs() plot to diagnose sampling problems
  23 dates: overall warnings
Fitting errors:
  1 date: data don't end when expected
Predictions (23 dates):
# A tibble: 10 x 9
   date         GPP GPP.lower GPP.upper      ER ER.lower ER.upper msgs.fit  msgs.pred
   <date>     <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl> <chr>     <chr>    
 1 2016-06-25  1.92      1.55      2.11 -1.79   -2.84      -1.05  "w      " "       "
 2 2016-06-26  2.02      1.84      2.19  0.443  -0.0764     1.01  "w      " "       "
 3 2016-06-27  2.41      2.15      2.60  0.0534 -0.608      0.670 "w      " "       "
 4 2016-06-28  2.69      2.43      2.90  1.58    0.802      2.56  "w      " "       "
 5 2016-06-29  3.61      3.46      3.84 -0.858  -1.55      -0.291 "w      " "       "
 6 2016-06-30  3.47      3.25      3.65  4.07    3.17       6.31  "w      " "       "
 7 2016-07-01  2.68      2.49      2.85  3.83    3.15       4.94  "w      " "       "
 8 2016-07-02  1.45      1.25      1.66 -1.24   -1.83      -0.395 "w      " "       "
 9 2016-07-03  1.96      1.80      2.13  0.623   0.00148    1.15  "w      " "       "
10 2016-07-04  4.78      4.63      4.95  0.0830 -0.700      0.825 "w      " "       "
  ...

@aappling-usgs
Copy link
Contributor

Hi Moritz,

how much does the time window size (=number of consecutive days) affect the estimates?

consecutive days don't matter, but when you model a greater total number of days at once, that can improve the model's estimation of the distribution of K600 values, error magnitudes, and other parameters. If you expect that K600 is varying over time in these tanks (due to changing wind speeds, or periods when macrophytes greatly reduce circulation, maybe?), you might want to use a pooling approach other than Kn, which assumes there is some fixed distribution that applies to all time. I think the piecewise-linear approach (Kb) could be adapted to relate K to some other parameter such as mean daily wind speed, plant biomass, or even date if desired.

and related, over a longer timestretch, day_start and day_end I specify will differ for a lot of my days. why is this parameter so important (has a big effect on the model estimates), and, is there a way to specify it dynamically (or chop up the timeseries)?

I know you've found that effect, and it continues to impress and surprise me, though I can't yet explain it. streamMetabolizer does not support different values of day_start and day_end on different days within the same model. One solution would be to run a different model for each unique set of values of day_start and day_end that you want to use. Another might be to change the solar.time stamps on your data, falsifying them so that the 24-hour windows you actually want always have the same series of time labels. streamMetabolizer only uses the solar.time column to identify the chunk of data appropriate to each date, so you wouldn't be messing up the modeling by doing this.

regarding positive ER estimates you mentioned last time that "If the nighttime oversaturation is real, you may simply need to interpret the ER values as being ER minus the rate of O2 pushing its way into the water from bubbles." do you think there is a way to calculate this gas transfer from my data?

For an inverse approach alone to be able to tease apart ER and bubbles, we'd need some information on how the temporal patterns in those two variables differed - and they would need to differ enough for the model to be able to distinguish them. However, you might be able to use field measurements to trap bubbles and estimate their abundance and contribution, after which you could simply subtract that contribution from the "ER" estimated by the model if it's safe to assume that both processes are reasonably constant over each 24-hour period.

can I still interpret my GPP estimates meaningfully even if ER is positive / not meaningful, i.e. just use GPP and ignore ER? I was thinking of this because of Bob's comment: "ER >> GPP is weird unless you are adding external C or the tanks have high ER from productivity that occurred before you measure metabolism. But ER is difficult to measure."

Yes, I think the model can probably provide accurate GPP estimates even when the ER estimates are contaminated, as long as the contamination (bubbles in this case) is constant throughout the day and night.

@mluerig
Copy link
Author

mluerig commented Feb 11, 2019

when you model a greater total number of days at once, that can improve the model's estimation of the distribution of K600 values, error magnitudes, and other parameters. If you expect that K600 is varying over time in these tanks (due to changing wind speeds, or periods when macrophytes greatly reduce circulation, maybe?), you might want to use a pooling approach other than Kn, which assumes there is some fixed distribution that applies to all time. I think the piecewise-linear approach (Kb) could be adapted to relate K to some other parameter such as mean daily wind speed, plant biomass, or even date if desired.

I actually would not expect K600 to vary all that much. it has been shown that the ponds are relatively stratified, with only very little mixing from wind etc.. maybe there is a way to completely fix k600?

One solution would be to run a different model for each unique set of values of day_start and day_end that you want to use.

yes, that's what I thought as well. I will try with different windows sizes and compare the results. what do you think a sensible minimum window-size would be - something like 3 days?

Another might be to change the solar.time stamps on your data, falsifying them so that the 24-hour windows you actually want always have the same series of time labels. streamMetabolizer only uses the solar.time column to identify the chunk of data appropriate to each date, so you wouldn't be messing up the modeling by doing this.

here I would simply align a solar.time stamp with the sunrise and make that day_start , e.g.: sunrise is at 05:33, so I "lag" solar.time in all following days so that first light is in 5:33 as well?

estimating bubble dynamics won't work for this data set, we don't have the capacity to do that at the moment. but I think this is a really important point for future work using ecosystems in tank: in all experiments I have done so far there was pronounced supersaturation that led to weird patterns

@aappling-usgs
Copy link
Contributor

I actually would not expect K600 to vary all that much. it has been shown that the ponds are relatively stratified, with only very little mixing from wind etc.. maybe there is a way to completely fix k600?

We designed the Bayesian models with a strong focus on supporting estimation rather than fixing of K. But you can get pretty close by using the Kn0 specification (pool_K600='normal_sdzero'), like so:

sp <- specs(mm_name('bayes', pool_K600='normal_sdzero')) %>%
  revise(K600_daily_meanlog_meanlog = log(11), # set this to the natural log of the value you want to fix K600 at
         K600_daily_meanlog_sdlog = 0.001) # make this small to force K600_daily_meanlog to stick to the meanlog_meanlog

yes, that's what I thought as well. I will try with different windows sizes and compare the results. what do you think a sensible minimum window-size would be - something like 3 days?

Sure, if you want the start and end times to drift slowly over time, a moving window of 3 consecutive days at a time sounds like a fine thing to try. There's no particular need to make them be consecutive days except to the extent that those three days are likely to have similar magnitudes of errors and metabolism.

here I would simply align a solar.time stamp with the sunrise and make that day_start , e.g.: sunrise is at 05:33, so I "lag" solar.time in all following days so that first light is in 5:33 as well?

I think we're talking about the same thing, yes. For a very simple example, suppose you have observations on the hour every hour, and you want day 1 to run from 4 am to 4 am, day 2 to run from 5 am to 5am, and day 3 to run from 6 am to 6 am. You could use day_start = 4, day_end=28, and then adjust the solar times as follows:

library(tidyverse) 

day_start = 4
day_end = 28

dat <-
  tibble(
    obs.id = 1:75,
    true.solar.time = seq(as.POSIXct('2019-02-12 04:00'), as.POSIXct('2019-02-15 06:00'), by=as.difftime(1, units='hours')),
    date.id = c(rep(1, 24), NA, rep(2, 24), NA, rep(3, 25))) %>%
  filter(!is.na(date.id)) %>%
  group_by(date.id) %>%
  mutate(new.solar.time = as.POSIXct('2019-02-11 00:00') +
           as.difftime(date.id*24 + seq(day_start, by=1, length.out=n()), units='hours')) %>%
  ungroup() %>%
  mutate(date.id = factor(date.id))

attr(dat$true.solar.time, 'tzone') <- NULL
dat %>%
  gather(time.type, solar.time, true.solar.time, new.solar.time) %>%
  ggplot(aes(x=obs.id, y=solar.time, color=time.type, shape=date.id)) +
  geom_point() +
  theme_bw()

image

Thus, the timestamps on day 2 are moved back by 1 hour, and the timestamps on day 3 are moved back by 2 hours.

estimating bubble dynamics won't work for this data set, we don't have the capacity to do that at the moment. but I think this is a really important point for future work using ecosystems in tank: in all experiments I have done so far there was pronounced supersaturation that led to weird patterns

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants