Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate ggplot functions in future rstan? #1051

Open
jgabry opened this issue Mar 16, 2023 · 53 comments
Open

Deprecate ggplot functions in future rstan? #1051

jgabry opened this issue Mar 16, 2023 · 53 comments

Comments

@jgabry
Copy link
Member

jgabry commented Mar 16, 2023

@bgoodri @andrjohns @hsbadr Should we deprecate all the ggplot functions in rstan? Maybe in conjunction with the release of 2.26 or another future version? The ggplot functions in rstan were added before bayesplot existed and I'm not sure if there's any reason to keep supporting both. We could deprecate and point users to bayesplot. Thoughts?

@bgoodri
Copy link
Contributor

bgoodri commented Mar 16, 2023

Yes

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

Agreed. We can remove them from both v2.26 & the experimental branch. Though, not sure if any dependency is using them right now. This won't affect the release of StanHeaders v2.26.

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

I think there may be a few dependencies using them, but not many. How about just a deprecation warning at first and then removing them entirely at some point in the future? If that sounds good I can make a PR against the experimental branch adding the deprecation warnings.

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

@andrjohns Can you work on this for the experimental branch? I'll backport to v2.26 and run reverse dependency checks.

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

I think there may be a few dependencies using them, but not many. How about just a deprecation warning at first and then removing them entirely at some point in the future? If that sounds good I can make a PR against the experimental branch adding the deprecation warnings.

That works, but we could also depend on bayesplot or create aliases (if bayesplot is installed use it; otherwise, print a deprecation warning).

I think there may be a few dependencies using them

Alternatively, we could remove them completely and contact the maintainers to switch to bayesplot.

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

I vote for removing them completely, if the number of dependencies is small. We can easily fix it for them.

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

That's true, although I think there are a lot of users who got accustomed to using the plotting functions in RStan and giving them one release with a message pointing them to bayesplot would be nice (instead of them just getting an error from R that the function doesn't exist). @bgoodri and @andrjohns what do you guys think?

@andrjohns
Copy link
Contributor

andrjohns commented Mar 16, 2023

If it's going to require fixing other package dependencies, I'd prefer to wait until 2.31 is on CRAN before removing. At the moment the submission path for 2.31 is relatively clear, with all dependencies having patches either submitted or released to CRAN. I'd rather not add an additional source of delay/breakage on that side of things

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

@andrjohns That makes sense. Are you ok if I add deprecation warnings? That way everything continues to work (no breaking dependencies) but we warn them of the change coming in the future?

@andrjohns
Copy link
Contributor

I'm on board with deprecation warnings

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

I can make a PR. Is the right place to do that the experimental branch?

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

I can make a PR. Is the right place to do that the experimental branch?

Yes. That's the target for v2.32 after releasing StanHeaders v2.26; we can jump directly to the latest version of rstan since StanHeaders now includes a compatible version of stanc3.

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

It occurs to me that the one thing we lose by removing these functions from rstan is the ability to automatically unconstrain parameters if the user wants (I think this is why we haven't already deprecated them, if I remember previous discussion accurately). bayesplot can't unconstrain the parameters automatically, so the user would have to unconstrain and then pass to bayesplot if they want plots of unconstrained parameters.

Are we ok with that or is that sufficient reason to keep plots in rstan itself?

@andrjohns
Copy link
Contributor

Could the rstan functions instead just be a wrapper that performs the unconstraining and then calls bayesplot? It could also be something that gets built into bayesplot

@andrjohns
Copy link
Contributor

The idea being not to remove existing functionality, in case it's needed in a workflow somewhere

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

Could the rstan functions instead just be a wrapper that performs the unconstraining and then calls bayesplot? It could also be something that gets built into bayesplot

Maybe the first option. I think if we can keep bayesplot as separate as possible and not call rstan::unconstrain_pars inside bayesplot that would be ideal.

This is only really an issue for complicated transformations. For a standard deviation, for example, it's easy to get bayesplot to do the transformation:

bayesplot::mcmc_hist(x, transformations = list(sigma = "log")) 

but you have to know the function to specify. For more complicated transformations it would be tough for the user to do this without relying on unconstrain_pars. So yeah maybe we should do it in rstan and then call bayesplot instead of deprecating the rstan functions. I'll look into that.

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

@andrjohns What's the best way for me to build the experimental branch locally? Seems like I need StanHeaders 2.31.0, right?

@andrjohns
Copy link
Contributor

I either just devtools::install() the StanHeaders subdir and then the rstan/rstan subdir, or use the pre-built package source from any of the recent actions runs on the experimental branch

@andrjohns
Copy link
Contributor

(and yeah I normally use StanHeaders 2.31 with it as well)

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023 via email

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

@andrjohns I successfully built the experimental branch, and it's a good thing I tried it because I think the existing code for plotting unconstrained parameters breaks, potentially due to changes in the generated C++ code in newer versions of Stan.

In the past this (admittedly hacky) helper function was able to get the names of the variables declared in the parameters block:

.get_stan_params <- function(object) {
stopifnot(is.stanfit(object))
params <- grep("context__.vals_r", fixed = TRUE, value = TRUE,
x = strsplit(get_cppcode(get_stanmodel(object)), "\n")[[1]])
params <- sapply(strsplit(params, "\""), FUN = function(x) x[[2]])
intersect(params, object@model_pars)
}

(I think it would get data + parameters and then drop the data by intersecting with model_pars)

But now it seems that grepping for context__.vals_r in the C++ code only returns variables in the data block not the parameters block. This basically breaks the plotting code for unconstrained parameters (we were using this to provide a way to unconstrained parameters automatically for the user, for which we needed to know if the parameter the user asked to plot was in the parameters block -- if it was in generated quantities, for example, we couldn't unconstrain it).

Do you know of a way to get the names of variables just in the parameters block that I could use instead of this?

@andrjohns
Copy link
Contributor

Ooh good question. Wouldn't it be easier to grep the Stan code in the stanfit object for the entries of the parameters block? Then it wouldn't be dependent on the C++

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023 via email

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

Is there a way to run stanc3's auto formatter (I don't see it as an option to stanc)? I think one reason we didn't grep the Stan code is that we'd need to find the parameters block but the user could have the word "parameters" in comments. The block doesn't even have to start on a new line due to Stan's whitespace rules, so we can't just search for "parameters" starting a line. There may be other issues along those lines too. (I think the auto formatter removes comments if I recall correctly).

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

Is there a way to run stanc3's auto formatter (I don't see it as an option to stanc)?

The auto-formatting option generates Stan code while stanc() generates C++ code. To run the formatter internally, use auto_format = TRUE in stanc_process or set stanc.auto_format option to TRUE.

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

Is there a way to run stanc3's auto formatter (I don't see it as an option to stanc)?

The auto-formatting option generates Stan code while stanc() generates C++ code. To run the formatter internally, use auto_format = TRUE in stanc_process or set stanc.auto_format option to TRUE.

Thanks! I'll try stanc_process. I think I'll need to take the user's stanfit object that they pass to the plotting function, grab the stan code, and feed it into stanc_process with auto_format = TRUE. That seems like the only way to get an auto formatted version of the user's existing Stan program?

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

The easiest way is to set the option:

options(stanc.auto_format = TRUE)

We could enable it by default (which would help brms; see paul-buerkner/brms#1376) or pass auto_format argument to stanc(), if needed.

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

But actually I guess auto formatting doesn't remove user code comments (so the word "parameters" can appear anywhere in the Stan file even after auto formatting). I need a way to figure out which variables are declared in the parameters block (as opposed to transformed parameters and generated quantities). @andrjohns suggested grepping the Stan code itself but that seems potentially error prone if the word "parameters" can appear anywhere in user comments.

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

You could use a C++ preprocessor to remove the comments; something like $CXX -fpreprocessed -dD -E <filename.cpp>.

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

For Stan code, could we request this in stanc3?

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

Is there a way to use stanc3's existing --info option in the experimental rstan branch? I think that would actually give the information I need.

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

Sure. We can pass any argument to stanc3 JS, if it's accepted. I'd suggest that we create a new helper function though since stanc() won't need to use this argument. Try the following:

model_info <- stanc_ctx$call("stanc", model_cppname, model_code, as.array("info"))

stanc_ctx is defined internally in rstan on load.

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

Awesome, thanks. That works. I can get the following now:

{
  "inputs": {},
  "parameters": {
    "theta": { "type": "real", "dimensions": 0 },
    "beta": { "type": "real", "dimensions": 1 }
  },
  "transformed parameters": { "omega": { "type": "real", "dimensions": 1 } },
  "generated quantities": {},
  "functions": [],
  "distributions": [],
  "included_files": []
}

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

Awesome, thanks. That works.

Great! I suggest that you wrap it with try() and fallback to another approach or return and error message:

    model_info <- try(stanc_ctx$call("stanc", model_cppname,
                                     model_code, as.array("info")),
                                     silent = TRUE)
    if (!inherits(model_info , "try-error")) {
      # model_info <- ?
      stop(<error message>)
    }

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

Good idea

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

Also, if the info is in a standard format (yaml?), you can interpret it directly in R without grep as text.

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

jsonlite::fromJSON(model_info$result) seems to work really nicely to get this into a R list, but that requires the jsonlite package that rstan doesn't currently depend on. Do you know anything in base R that could handle it?

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

Do you know anything in base R that could handle it?

No. I'd create a simple helper function to convert the JSON info, if it's safe to do so. Otherwise, adding the dependency could be helpful depending on the value of the ability to automatically unconstraint the parameters.

@hsbadr
Copy link
Member

hsbadr commented Mar 16, 2023

We could also use the info in other places, replacing a lot of grep code.

@jgabry
Copy link
Member Author

jgabry commented Mar 16, 2023

I'll see how much of a pain it is to parse the JSON without adding a dependency, but yeah it could be worth adding the dependency if we can use this info elsewhere.

@WardBrian
Copy link
Member

if the info is in a standard format (yaml?)

the info should be in standards-compliant JSON. Do ping me if there is anything you wish --info printed out but doesn't, and we can see if it's feasible

@hsbadr
Copy link
Member

hsbadr commented Mar 20, 2023

Do ping me if there is anything you wish --info printed out but doesn't, and we can see if it's feasible

Thanks @WardBrian! That would be great. We could rely on stanc3 --info instead of dealing with the code as text. I'll look into this ASAP, to list the missing info we may need.

@jgabry
Copy link
Member Author

jgabry commented Mar 20, 2023

After playing around with all of this for a while, I think the fundamental issue here is that unconstrain_pars is just really hard to use. If the user could easily unconstrain all the posterior draws for a specified subset of parameters then we could just get rid of all the plotting code in RStan, which would be great.

Basically what we need is something like

unconstrain_draws(stanfit, pars = "tau") 

which returns all of the posterior draws for tau but unconstrained. If we had this then we wouldn't need to maintain any plotting code in RStan. The user could just do something like this

stanfit %>%
 unconstrain_draws(pars = "tau") %>% 
 bayesplot::mcmc_trace()

@andrjohns You know more about unconstrain_pars than I do (or I assume so since you figured it out for CmdStanR and I haven't spent much time with it). How hard do you think this would be to do for RStan?

@andrjohns
Copy link
Contributor

That's a good-looking syntax!

Yeah it wouldn't be too hard to implement, the main thing is making a "sane" structure for the return (I'm not overly happy with how cmdstanr returns them now - a list of vectors per chain). A discussion I had with @n-kall a while back was to implement the structured format for unconstrained pars in the draws_* format, so that we can easily handle multiple chains in a syntax/structure consistent with how users interact with the constrained draws.

I'll be getting back to cmdstanr development later this week (now that there's not much more do for rstan for a bit), so I can implement that in cmdstanr and then port the approach over to rstan. It will be easier to do in cmdstanr first since the plumbing for unconstraining everything is already in place, it's just a structure/format change

@jgabry
Copy link
Member Author

jgabry commented Mar 20, 2023 via email

@WardBrian
Copy link
Member

This may be an inappropriate place to ask, but since all the major stakeholders seem to be participating here: Is there a good place to follow along with the general status of RStan 2.26, CRAN updates, etc?

@andrjohns
Copy link
Contributor

This may be an inappropriate place to ask, but since all the major stakeholders seem to be participating here: Is there a good place to follow along with the general status of RStan 2.26, CRAN updates, etc?

For 2.26 you can follow #1034, that's tracking the progress of the downstream packages and their patching, as well as the submission decision

For 2.31+ you can follow #1046

@hsbadr
Copy link
Member

hsbadr commented Mar 20, 2023

This may be an inappropriate place to ask, but since all the major stakeholders seem to be participating here: Is there a good place to follow along with the general status of RStan 2.26, CRAN updates, etc?

The next step is to release StanHeaders v2.26, which should be ready. The reverse dependency checks are available here. The failing dependencies are temporary and can be passed by communicating with CRAN.

@WardBrian
Copy link
Member

Does that mean it could be a matter of days (rather than weeks, months, etc.) for RStan 2.26? Or would you expect it to still be further off

@andrjohns
Copy link
Contributor

@bgoodri how have CRAN normally been for rstan releases? Much of a battle?

@bob-carpenter
Copy link

how have CRAN normally been for rstan releases? Much of a battle?

There's a reason there hasn't been a new RStan release in years. I believe the choke point is lack of dependency management in CRAN. Without that, Stan's policy of not maintaining low-level backward compatibility of internals has made it impossible to get a new release out that is compatible with old and new versions of Stan, which would be needed to release without pinning dependency versions.

@andrjohns
Copy link
Contributor

andrjohns commented Mar 20, 2023

We've actually got all of that in place for 2.26 and then 2.31 (sans ~10-15 packages yet to merge and release patches I've submitted), it's just a matter of the actual submission now and whether that's enough notice to maintainers for CRAN to accept breaking their packages

@hsbadr
Copy link
Member

hsbadr commented Mar 20, 2023

Does that mean it could be a matter of days (rather than weeks, months, etc.) for RStan 2.26? Or would you expect it to still be further off

This is a good question for @bgoodri :)

He could submit it now and provide notes to justify the failed dependencies.

There's a reason there hasn't been a new RStan release in years.

Originally, dozens of packages failed until we implemented Stan/Math patches for backward compatibility and notified the package maintainers about the required changes (and even submitted PRs to fix them) months or years in advance. CRAN is ok with the notices alone, and we've already fixed the majority of dependencies. Only two packages fail now with segfault due to the incompatibility of the new StanHeaders and the old rstan (precompiled with different headers). This is a circular dependency.

CRAN Diagram

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants