Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Parameters to an Existing Chain #298

Open
farr opened this issue Apr 29, 2021 · 5 comments
Open

Adding Parameters to an Existing Chain #298

farr opened this issue Apr 29, 2021 · 5 comments

Comments

@farr
Copy link

farr commented Apr 29, 2021

Perhaps this is more related to #246 but I am universally in a situation where I want to add additional parameters to my chains, and this is unbelievably hard in the current architecture (it would be helped by work on Project 1 from #246 where the backend array format is replaced with something a bit more flexible). Often these are "generated parameters" from the generated_quantities function, but sometimes they go beyond that and are quantities (often not scalars! so it would be nice not to have the clunky [chain[Symbol("x[$i]")] for i in 1:N] interface for such) that I've computed for each sample and would like to store in a convenient way. It would be ideal to be able to do something like

my_quantity = map(my_function, chain[:x], chain[:y], chain[:z], ...)
chain[:my_quantity] = my_quantity

but maybe

push!(chain, :my_quantity => my_quantity, ...)

would be more natural?

In the meantime, I've written a little slice of code that I end up putting into all my Turing projects that can at least append the generated_quantities output to a chain:

function _names(k, d)
    string(k)
end
function _names(k, d::Vector)
    ["$(k)[$(i)]" for i in 1:length(d)]
end

function _3Dify(d::Array{Float64, 2})
    ns, nc = size(d)
    reshape(d, (ns, 1, nc))
end
function _3Dify(d::Array{Array{Float64, 1}, 2})
    ns, nc = size(d)
    np, = size(d[1,1])

    out = zeros(ns, np, nc)
    for k in 1:nc
        for j in 1:np
            for i in 1:ns
                out[i,j,k] = d[i,k][j]
            end
        end
    end
    out
end

"""
    append_generated_quantities(trace, genq)

Given a trace (`MCMCChains` object) and a 2D array of named tuples corresponding
to `generated_quantities` output, concatenate the two together into a single
`MCMCChains` object.
"""
function append_generated_quantities(trace, genq)
    ks = keys(genq[1,1])
    nms = []
    ds = []
    for k in ks
        d = map(x -> getindex(x, k), genq)
        push!(nms, _names(k, d[1,1]))
        push!(ds, _3Dify(d))
    end
    nms = vcat(nms...)
    ds = cat(ds...; dims=2)

    nmap = trace.name_map
    atrace = cat(Array(trace, keys(nmap), append_chains=false)..., dims=3)
    Chains(cat(atrace, ds, dims=2), vcat(map(string, names(trace)), nms), nmap)
end

that would be nice to have included in MCMCChains.jl. But maybe if folks are working on better backends / extensions, there are better ideas around for this very useful facility?

@cpfiffer
Copy link
Member

You should be able to generate a separate chain with the generated quantities and then just call hcat(chain1, chain2), which (I hope) will smush them together.

@cpfiffer
Copy link
Member

But I agree the immutability here is occasionally very unpleasant to deal with.

@devmotion
Copy link
Member

Yes, I can see that this is annoying if you want to add a single column. hcat would be my suggestion as well:

julia> chain = Chains(rand(100, 3, 4), [:a, :b, :c]);

julia> d = map(chain[:a], chain[:c]) do a, c
           a + c
       end;

julia> hcat(chain, Chains(reshape(d, 100, 1, 4), [:d]))
Chains MCMC chain (100×4×4 Array{Float64, 3}):

Iterations        = 1:100
Thinning interval = 1
Chains            = 1, 2, 3, 4
Samples per chain = 100
parameters        = a, b, c, d

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat 
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64 

           a    0.5119    0.2856     0.0143    0.0159   467.1064    0.9995
           b    0.4879    0.2864     0.0143    0.0093   441.2094    0.9985
           c    0.5019    0.2811     0.0141    0.0052   378.1471    0.9952
           d    1.0138    0.4055     0.0203    0.0137   419.7393    0.9975

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           a    0.0219    0.2711    0.5327    0.7394    0.9780
           b    0.0274    0.2256    0.4827    0.7456    0.9787
           c    0.0340    0.2338    0.5185    0.7370    0.9642
           d    0.2148    0.7257    1.0324    1.3060    1.7577

@sethaxen
Copy link
Member

sethaxen commented Jan 2, 2022

FWIW, the hcat solution does not always work when the chains are generated with Turing, because the iterations field filled by Turing and the one automatically generated by Chains may not have the same values, so one gets:

ERROR: ArgumentError: chain ranges differ

@devmotion
Copy link
Member

One has to make sure that the ranges are equal when calling hcat. Otherwise, it is unclear what the range of the output should be. One can reset ranges with resetrange or modify ranges with setrange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants