Add sbc_hist #193

jpritikin · 2019-05-17T12:18:02Z

Compared to the previous version, I added:

the pretty uncertainty interval butterfly
a bit more documentation
an example

jgabry · 2019-05-17T16:59:19Z

Thanks a lot! Will review soon.

jgabry · 2019-05-17T17:27:24Z

@jpritikin How do you feel about enabling this https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork so I can push directly to your PR branch?

jpritikin · 2019-05-17T17:29:01Z

@jpritikin How do you feel about enabling this https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork so I can push directly to your PR branch?

Looks like it's already enabled. Go ahead with your changes.

jgabry · 2019-05-17T17:47:18Z

Ok great thanks.

jgabry · 2019-05-17T22:31:45Z

I made a bunch of edits so that it conforms to the conventions we've been using in bayesplot. I still need to add a few tests and I'd like to get @avehtari (and/or @dpsimpson) to take a look and comment on a few things:

The first argument is called ranks even though it’s really indicators that can be made into ranks. This lines up with what @bgoodri did in rstan::sbc() so that’s nice. But should we call this argument something else?
The worst argument is used to specify how many parameters to plot (e.g., worst=5 to plot the 5 worst parameters). I like this idea, but do we want any mechanism for selecting parameters by name? It seems less important to have that for this plot than the other plots in bayesplot, but for consistency should we have it?
The way the "worst" parameters are determined is using KL divergence. Seems reasonable, but is this the best option for this scenario?
Should there be changes to the default settings for the arguments?

Example

# create some fake inputs to use for sbc_hist()
set.seed(19)
pars <- paste0("beta[", 1:4, "]")
samples_per_prior <- 511
n_replications <- 500
ranks <- list()
for (n in 1:n_replications) {
  r1 <- matrix(0, nrow=samples_per_prior, ncol=length(pars),
               dimnames=list(NULL, pars))
  for (p1 in 1:length(pars)) {
    r1[sample.int(samples_per_prior, floor(runif(1, 0, samples_per_prior))), p1] <- 1
  }
  ranks[[n]] <- r1
}

color_scheme_set("purple")
sbc_hist(ranks)

codecov-io · 2019-05-17T22:42:31Z

Codecov Report

Merging #193 into master will decrease coverage by 0.03%.
The diff coverage is 97.7%.

@@            Coverage Diff             @@
##           master     #193      +/-   ##
==========================================
- Coverage   99.31%   99.28%   -0.04%     
==========================================
  Files          30       31       +1     
  Lines        4088     4175      +87     
==========================================
+ Hits         4060     4145      +85     
- Misses         28       30       +2

Impacted Files	Coverage Δ
R/sbc.R	`97.7% <97.7%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5ad462...633bd0b. Read the comment docs.

jgabry · 2019-05-17T23:08:22Z

And one other consideration: do we want the only allowed input type to be a list of matrices? This type of input could also be stored in a 3-D array or a (long) data frame. This doesn’t necessarily need to be decided now since can always add that ability later without breaking anything.

jpritikin · 2019-05-17T23:10:33Z

Generally looks good, but why did you get rid of the horz dotted lines at the top and bottom on the butterfly? Especially for the lower side, the histogram bars can cover most of the butterfly and it's hard to see the boundary. If you don't like the top dotted line, at least keep the bottom one.

jgabry · 2019-05-17T23:12:51Z

Oops. I meant to put that back in. I had removed it while trying to figure something else out. I’ll add it back.

jpritikin · 2019-05-17T23:15:08Z

I'm curious what you kept the horz line in the middle of the butterfly. I mean, what matters is whether the histogram bars are within the butterfly or not. It doesn't really matter whether they are close to the mean, correct?

jgabry · 2019-05-17T23:19:28Z

I just did it to match the paper, and I think it looks nice. But now that you mention it I think it could draw the eye to that part of the plot which isn’t the most relevant. Thanks for the feedback.

* remove segment in center of CI * add segments at extremes of CI (lower one is in front of histogram, upper is behind, I think it looks nice) * add x-axis ticks and labels at 0, samples_per_prior/2,samples_per_prior (this does convey some information)

jgabry · 2019-05-18T00:09:45Z

Here's the result of the example from above after the following changes:

remove segment in center of CI
add faint line segments (not full lines) at extremes of CI (lower one is in front of histogram, upper is behind, I think it looks nice)
add x-axis ticks and labels at 0, samples_per_prior/2, samples_per_prior (this does actually convey some information)

also rename samples_per_prior to thinned_sample_size in internal code

jgabry · 2019-05-18T00:25:31Z

jpritikin · 2019-05-18T02:19:13Z

Beautiful 👍

avehtari · 2019-05-19T20:39:42Z

Thanks for working on this. Some comments

Butterfly histogram is useful also for PITs and MCMC rank plots (see #179), so this should be modularized so that there would be only one butterfly histogram plotting function.

The first argument is called ranks even though it’s really indicators that can be made into ranks. This lines up with what @bgoodri did in rstan::sbc() so that’s nice. But should we call this argument something else?

As in other pull request @jpritikin noticed that it's waste of memory to save indicators, it would be better to provide just ranks for this function.

The worst argument is used to specify how many parameters to plot (e.g., worst=5 to plot the 5 worst parameters). I like this idea, but do we want any mechanism for selecting parameters by name?

Yes, we want to be able to select parameters by name, too. There are often specific parameters we care more.

The way the "worst" parameters are determined is using KL divergence. Seems reasonable, but is this the best option for this scenario?

In the paper and dicussions outside the paper, we don't recommend any quantity. KL is known also to miss some things other diagnostics (there's a 20+ uniformity tests paper) or human can see.

#' @param thin An integer indicating the thinning interval to use when plotting
#' so that the histograms consist of (close to) independent realizations. Set
#' the thin argument such that the resulting number of draws approximately
#' matches the effective sample size.

The current arxiv version is wrong about this and the updated version is not complete from other parts. The difference is not big for thetic chains, but antithetic chains may have effective sample size larger than the number of iterations. In this case the chain should be thinned, too. One way is to thin first by 3 (thinning by odd lag removes the antithetic behavior) and then check again the effective sample size.

IMO there is no need to warn about uneven samples per bin, if the variation is small. For example, if we thin we we may often have uneven division but, e.g.. if we have some bins with 50 and some 51, the variation is 2% which is visually very small. Alternatively it would be useful to round to even.

jpritikin · 2019-05-19T20:54:23Z

As in other pull request @jpritikin noticed that it's waste of memory to save indicators

@avehtari Actually I realized that you can't change the thinning if you don't have the raw ranks data. It's not actually as much data as I originally thought. I don't think it's a problem to pass around the raw ranks.

we don't recommend any quantity

If there are 1000s of parameters, are you suppose to inspect every plot? There's got to be a way to prioritize, even if it's not perfect.

there's a 20+ uniformity tests paper

Might be nice to cite. DOI 10.1080/02331880500178562 ?

One way is to thin first by 3 (thinning by odd lag removes the antithetic behavior) and then check again the effective sample size.

Can this be (partially) automated?

avehtari · 2019-05-19T21:09:33Z

If there are 1000s of parameters, are you suppose to inspect every plot?

I would not recommend that.

There's got to be a way to prioritize, even if it's not perfect.

I just stated that we don't recommend anything as Jonah asked my opinion. The reason is that we don't know what to recommend. Now when thinking this, it might be useful to have different options, for example showing most smiley ones, most frowny ones, most biased ones, specifically edge problems, etc. If I would need to choose just one, I think I would continue testing something computed from ecdf vs. envelope for uniform. I'm sorry that I don't have better answer for this.

I also think that it's better to show "worst" ones instead of just trying to make hypothesis test that all is fine or not. Visual inspection helps to learn more.

Might be nice to cite. DOI 10.1080/02331880500178562 ?

Yes. 34 different tests and they recommend any one of them as the best for all. Good to cite as it shows that there is no one good solution.

Can this be (partially) automated?

If ESS>S, thin by 3. Then proceed as before.

jpritikin · 2019-05-20T00:28:57Z

I also think that it's better to show "worst" ones instead of just trying to make hypothesis test that all is fine or not.

Agreed. How about if we add some other measures of uniformity in addition to KL and require the user to specify which measure to use in conjunction with worst?

tjmahr · 2019-05-20T14:30:34Z

What's the point of the butterfly shape, as opposed to a ribbon?

I don't think the bin boundaries are working correctly. First one straddles 0 and last one do not appear to evenly line up with 128, at least on my device. Histograms work on continuous data, so they don't respect the discrete boundaries of the rankings by default.

The use of the ribbon expands the x-axis so that the nonsensical values are part of the binning, so I think we are getting 128 ranks pushed into 29 bins which is causing uneven binning.

sbc_hist(ranks) + 
  geom_text(aes(label = stat(count)), stat = "bin", bins = 32)

This is probably a better sense of what 32 bins should be...

sbc_hist(ranks) + 
  stat_count(aes(x = plyr::round_any(u, 4, floor)))

tjmahr · 2019-05-20T14:42:25Z

Aah shoot, that last figure has 33 bins because of how 129 was being rounded.

General problem still stands.

jgabry · 2019-05-21T15:21:59Z

Maybe using geom_bar() instead of geom_histogram() let us get around that problem?

avehtari · 2019-05-22T07:55:17Z

Another modularization suggestion. We need independent draws also for MCMC rank ecdf difference plots, and maybe for other things. Provide a common helper function to do thinning with couple options:

user set value
thinning by percentage of autocorrelation time: effective sample size calculation computes autocorrelation series and truncates that by Geyer's initial (pair) positive sequence rule. Often the last autocorrelations are small and for practical purposes useful thinning could be obtained by thinning at value where sum of autocorrelations is between 0.9 and 1.1. times of the sum of autocorrelations until the stopping rule. This rule would work also for antithetic chains.

sbc_hist could use this thinning function, too, with possibility of favoring thinning values which make the histogram division more even (ecdf difference plots are not sensitive to this)

jpritikin · 2019-06-12T15:43:19Z

Consider using uniftest to rank uniformity.

avehtari · 2019-06-12T17:15:36Z

@jpritikin thanks for pointing out the package, although it doesn't solve the problem which of the many tests to use.

Add sbc_hist

c776c9a

stan-dev#146

jpritikin force-pushed the master branch from a63ddc9 to c776c9a Compare May 17, 2019 13:48

jgabry added feature new plot labels May 17, 2019

jgabry requested review from jgabry and avehtari May 17, 2019 17:27

jgabry added 2 commits May 17, 2019 18:04

Merge remote-tracking branch 'upstream/master'

17ce7f9

edits to sbc_hist to fit in with bayesplot conventions

79c4d5a

jgabry added 2 commits May 17, 2019 18:35

fixes for r cmd check

8c8d910

missing whitespace

45fc425

jgabry removed their request for review May 17, 2019 22:42

fixes to sbc_hist()

577fff0

* remove segment in center of CI * add segments at extremes of CI (lower one is in front of histogram, upper is behind, I think it looks nice) * add x-axis ticks and labels at 0, samples_per_prior/2,samples_per_prior (this does convey some information)

make CI line colors depend on current color scheme

be4ecdf

also rename samples_per_prior to thinned_sample_size in internal code

a few tests for sbc_hist

633bd0b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sbc_hist #193

Add sbc_hist #193

jpritikin commented May 17, 2019

jgabry commented May 17, 2019

jgabry commented May 17, 2019

jpritikin commented May 17, 2019

jgabry commented May 17, 2019

jgabry commented May 17, 2019 •

edited

codecov-io commented May 17, 2019 •

edited

jgabry commented May 17, 2019

jpritikin commented May 17, 2019 •

edited

jgabry commented May 17, 2019

jpritikin commented May 17, 2019

jgabry commented May 17, 2019 •

edited

jgabry commented May 18, 2019

jgabry commented May 18, 2019

jpritikin commented May 18, 2019

avehtari commented May 19, 2019

jpritikin commented May 19, 2019

avehtari commented May 19, 2019

jpritikin commented May 20, 2019

tjmahr commented May 20, 2019

tjmahr commented May 20, 2019

jgabry commented May 21, 2019

avehtari commented May 22, 2019

jpritikin commented Jun 12, 2019

avehtari commented Jun 12, 2019

Add sbc_hist #193

Are you sure you want to change the base?

Add sbc_hist #193

Conversation

jpritikin commented May 17, 2019

jgabry commented May 17, 2019

jgabry commented May 17, 2019

jpritikin commented May 17, 2019

jgabry commented May 17, 2019

jgabry commented May 17, 2019 • edited

Example

codecov-io commented May 17, 2019 • edited

Codecov Report

jgabry commented May 17, 2019

jpritikin commented May 17, 2019 • edited

jgabry commented May 17, 2019

jpritikin commented May 17, 2019

jgabry commented May 17, 2019 • edited

jgabry commented May 18, 2019

jgabry commented May 18, 2019

jpritikin commented May 18, 2019

avehtari commented May 19, 2019

jpritikin commented May 19, 2019

avehtari commented May 19, 2019

jpritikin commented May 20, 2019

tjmahr commented May 20, 2019

tjmahr commented May 20, 2019

jgabry commented May 21, 2019

avehtari commented May 22, 2019

jpritikin commented Jun 12, 2019

avehtari commented Jun 12, 2019

jgabry commented May 17, 2019 •

edited

codecov-io commented May 17, 2019 •

edited

jpritikin commented May 17, 2019 •

edited

jgabry commented May 17, 2019 •

edited