Support for Stan model profiling? #41

mbjoseph · 2021-05-20T18:53:56Z

mbjoseph
May 20, 2021

Does stantargets support profiling of Stan models? I'm getting mixed results -- on a fresh build it seems I can access profiles, but if I need to rebuild the output I can no longer do so. I put together a minimal example repository here: https://github.com/mbjoseph/stantargets-profile-ex

Here's a reprex to illustrate:

# first build succeeds in finding profiling info
targets::tar_destroy()
targets::tar_make()
#> • start target example_data
#> • built target example_data
#> • start target example_file_model
#> • built target example_file_model
#> • start target example_mcmc_model
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 finished in 0.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.7 seconds.
#> • built target example_mcmc_model
#> • start target example_summary_model
#> • built target example_summary_model
#> • start target example_diagnostics_model
#> • built target example_diagnostics_model
#> • start target report
#> • built target report
#> • start target example_draws_model
#> • built target example_draws_model
#> • end pipeline

# but if the report needs to be rebuilt, profiling info is lost
unlink("report.html")
targets::tar_make()
#> ✓ skip target example_data
#> ✓ skip target example_file_model
#> ✓ skip target example_mcmc_model
#> ✓ skip target example_summary_model
#> ✓ skip target example_diagnostics_model
#> • start target report
#> Quitting from lines 24-25 (report.Rmd) 
#> x error target report
#> • end pipeline
#> Error : No profile files found. The model that produced the fit did not use any profiling.
#> Error: callr subprocess failed: No profile files found. The model that produced the fit did not use any profiling.
#> Visit https://books.ropensci.org/targets/debugging.html for debugging advice.

^{Created on 2021-05-20 by the reprex package (v2.0.0)}

If I had to guess, I'd suspect that supporting profiling would involve writing a target/csv file with cmdstanr::save_profile_files(): https://mc-stan.org/cmdstanr/reference/fit-method-save_output_files.html

Answered by wlandau

May 21, 2021

With posterior draws and sampler diagnostics, the CmdStanFit object keeps the data in memory when the whole object is saved.

stantargets/R/tar_stan_mcmc.R

Lines 467 to 469 in d1b6c4c

     fit$draws() # Do not specify variables or inc_warmup.  
   try(fit$sampler_diagnostics(), silent = TRUE)  
   try(fit$init(), silent = TRUE)  

 

This does not appear to be the case with profiler samples.

library(cmdstanr)
#> This is cmdstanr version 0.4.0.9000
#> - Online documentation and vignettes at mc-stan.org/cmdstanr
#> - CmdStan path set to: /Users/c240390/.cmdstanr/cmdstan-2.26.1
#> - Use set_cmdstan_path() to change the path
callr::r(function() {
  library(cmdstanr)
  mcmc_program <- wr…

View full answer

wlandau · 2021-05-21T19:03:25Z

wlandau
May 21, 2021
Maintainer

With posterior draws and sampler diagnostics, the CmdStanFit object keeps the data in memory when the whole object is saved.

stantargets/R/tar_stan_mcmc.R

Lines 467 to 469 in d1b6c4c

    
           fit$draws() # Do not specify variables or inc_warmup. 
        
           try(fit$sampler_diagnostics(), silent = TRUE) 
        
           try(fit$init(), silent = TRUE)

This does not appear to be the case with profiler samples.

library(cmdstanr)
#> This is cmdstanr version 0.4.0.9000
#> - Online documentation and vignettes at mc-stan.org/cmdstanr
#> - CmdStan path set to: /Users/c240390/.cmdstanr/cmdstan-2.26.1
#> - Use set_cmdstan_path() to change the path
callr::r(function() {
  library(cmdstanr)
  mcmc_program <- write_stan_file(
    'data {
    int<lower=0> N;
    int<lower=0,upper=1> y[N];
  }
  parameters {
    real<lower=0,upper=1> theta;
  }
  model {
    profile("likelihood") {
      y ~ bernoulli(theta);
    }
  }
  generated quantities {
    int y_rep[N];
    profile("gq") {
      y_rep = bernoulli_rng(rep_vector(theta, N));
    }
  }
  ')
  mod_mcmc <- cmdstan_model(mcmc_program)
  data <- list(N = 10, y = c(1,1,0,0,0,1,0,1,0,0))
  fit <- mod_mcmc$sample(data = data, seed = 123, refresh = 0)
  profiles <- fit$profiles()
  draws <- fit$draws()
  saveRDS(fit, "fit.rds")
})
#> NULL

fit <- readRDS("fit.rds")
str(fit$draws())
#>  'draws_array' num [1:1000, 1:4, 1:12] -8.18 -8.2 -8.22 -8.28 -8.19 ...
#>  - attr(*, "dimnames")=List of 3
#>   ..$ iteration: chr [1:1000] "1" "2" "3" "4" ...
#>   ..$ chain    : chr [1:4] "1" "2" "3" "4"
#>   ..$ variable : chr [1:12] "lp__" "theta" "y_rep[1]" "y_rep[2]" ...

fit$profiles()
#> Error: No profile files found. The model that produced the fit did not use any profiling.

^{Created on 2021-05-21 by the reprex package (v2.0.0)}

@jgabry and @rok-cesnovar, would it be reasonable to support in-memory caching of profiler samples in CmdStanFit objects? I would prefer that stantargets avoid interacting directly with CmdStan CSV files.

3 replies

rok-cesnovar May 21, 2021

Yes, we should do the same thing we do for samples when storing the object. Mind opening an issue? Thanks!

jgabry May 21, 2021

Yeah I agree we should do this for profiler output too

wlandau May 22, 2021
Maintainer

Thank you both! I will post an issue.

mbjoseph · 2021-05-23T01:59:58Z

mbjoseph
May 23, 2021
Author

Thanks all - posting the cmdstanr PR here to close the loop: stan-dev/cmdstanr#508

1 reply

wlandau May 23, 2021
Maintainer

Yes, thanks @rok-cesnovar for the quick patch! This should work with stantargets as of 62ac86c.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Stan model profiling? #41

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

	fit$draws() # Do not specify variables or inc_warmup.
	try(fit$sampler_diagnostics(), silent = TRUE)
	try(fit$init(), silent = TRUE)

Support for Stan model profiling? #41

mbjoseph May 20, 2021

Replies: 2 comments · 4 replies

wlandau May 21, 2021 Maintainer

rok-cesnovar May 21, 2021

jgabry May 21, 2021

wlandau May 22, 2021 Maintainer

mbjoseph May 23, 2021 Author

wlandau May 23, 2021 Maintainer

mbjoseph
May 20, 2021

Replies: 2 comments 4 replies

wlandau
May 21, 2021
Maintainer

wlandau May 22, 2021
Maintainer

mbjoseph
May 23, 2021
Author

wlandau May 23, 2021
Maintainer