Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Chains objects display only information and not statistical eval #307

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

PaulinaMartin96
Copy link
Contributor

@PaulinaMartin96 PaulinaMartin96 commented Jun 10, 2021

As suggested on #246, Chains objects now display only information and not statistical evaluations.

Before:
MCMCChainsnotmodified

After:
MCMCChainsmodified

Also, the describe function now display both Summary statistics and Quantiles info.

Before:
describenotmodified

After:
describemodified

@PaulinaMartin96 PaulinaMartin96 marked this pull request as ready for review June 12, 2021 03:34
@cpfiffer
Copy link
Member

cpfiffer commented Jul 6, 2021

Woops, I missed this one as reviewable. If I forget, as I do often, you can flag me by requesting a review over here:

image

Copy link
Member

@cpfiffer cpfiffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For PRs like this where a UI change is involved, try to include a printout of the new behavior so someone can quickly look at the PR to see what it looks like. For example, I would want to see the results of show(chain), which is the primary change introduced by this PR.

Could you add that for me so I can take a look at the end result that a user might see?

Also, we follow ColPrac, one key feature of which is always making sure that the version number of whatever package you submit a PR to is increased according to semantic versioning. You can find the version number for MCMCChains in Project.toml, here.

For a change like this where the default behavior changes, I would recommend bumping us up to MCMCChains 4.14.0.

Flag me after you add a little description of what's changing and what I should expect to see and I'll do another pass on this one.

@cpfiffer cpfiffer mentioned this pull request Jul 6, 2021
@PaulinaMartin96 PaulinaMartin96 changed the title Make Chains objects display only information and not statistical eval… Make Chains objects display only information and not statistical eval Jul 7, 2021
@PaulinaMartin96
Copy link
Contributor Author

@cpfiffer thank you so much for the suggestions! I'll consider them in the future. I think the PR is ready. There's a conflict with Project.toml. Let me know if further changes are required.

@cpfiffer
Copy link
Member

cpfiffer commented Jul 7, 2021

Okay, thanks for adding that stuff. Looks good to me.

When there's a merge conflict, it's typically the PR person's job to fix the merge conflict. Fortunately this one is really easy -- you can use the GitHub web editor or your local git merge manager to fix the issues. GitHub has links on how to do this at the bottom of the PR scroll:

image

This is a good learning opportunity since it's one of the easier merge conflicts. Go ahead and try to fix this issue. In this case, you basically want to fix the version number to the one you specified (4.14.0) and not the one that's currently in master.

@PaulinaMartin96
Copy link
Contributor Author

Sorry for that, and again, thank you so much for your advice!

@cpfiffer
Copy link
Member

cpfiffer commented Jul 7, 2021

No problem, all part of the learning process. I've just enabled tests and will check back to make sure they've passed.

src/stats.jl Outdated
@@ -171,6 +171,15 @@ function describe(
return dfs
end

function Base.show(io::IO, mime::MIME"text/plain", cs::Vector{ChainDataFrame})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only captures the abstract Vector{ChainDataFrame} but not any concretely typed Vector{<:ChainDataFrame}. In general, I am a bit worried about changing the display of vectors of ChainDataFrames - it seems wrong to completely opt out of the default display mechanism of vectors in Julia (I also wonder if it causes any problems) just to change the way in which describe(chain) is displayed. Maybe rather describe should not return a vector of ChainDataFrame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm down with this. My understanding of your suggestion is that describe should be a pure IO function -- we should make display(io::IO, chn::Chain) and the output is all the stuff inside the Base.show definition above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not define display (it calls show(io, MIME("text/plain"), x) which should be implemented), but we should just implement DataAPI.describe(io, chain) if we want to display the summary statistics in a nice way: https://juliastats.org/StatsBase.jl/latest/scalarstats/#Summary-Statistics-1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your help! So, to double check that I'm understanding these suggestions. Instead of returning a Vector{ChainDataFrame}, describe should be an implementation of StatsBase.describe and return something like this?

chn = Chains(rand(100, 2, 2), [:a, :b])
chn_arr = Array(chn)
sections = chn.name_map[:parameters]
for i in 1:length(sections)
    println("Parameter $(sections[i])")
    describe(chn_arr[:,i])
    println()
end

StatsBase _describe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine if summarystats returns different summary statistics (so this method does not have to be changed) but we should make sure that describe just prints these summary statistics in a pretty way to be consistent with how describe and summarystats are defined in StatsBase. I.e., in particular describe should not return anything but only print to IO and it should not print the quantiles if they are not part of summarystats (here it might actually be better to include them in summarystats as well and return two dataframes, possibly as a named tuple).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify maybe a little bit, there's no need to change to this code you've posted:

chn = Chains(rand(100, 2, 2), [:a, :b])
chn_arr = Array(chn)
sections = chn.name_map[:parameters]
for i in 1:length(sections)
    println("Parameter $(sections[i])")
    describe(chn_arr[:,i]) # <- This is not what we want, we want to print the results of `describe(chain)` here instead
    println()
end

Basically we want to change the stuff that is currently in show to describe, so that describe becomes a pure IO function and not a weird Vector{<:ChainDataFrame} thing that we have now.

Maybe one way is to do something like

function DataAPI.describe(io::IO, chains::Chains)
    print(io, "Chains ", chains, ":\n\n", header(chains))

    summstats = summarystats(chains)
    qs = quantiles(chains)

    println(io)
    show(io, summstats)

    println(io)
    show(io, qs)
end

which won't actually return anything, it just prints stuff out to the screen. There's probably a way more sane way to do this, but it's a rough sketch to get you started.

@delete-merged-branch delete-merged-branch bot deleted the branch TuringLang:master December 24, 2021 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants