A Discrete Bayesian Graph #210

erlebach · 2022-05-27T02:55:08Z

I am implementing a discrete Bayesian network using a few Nonlinear functions and cannot get the messagePassingAlgorithm() method to run without error. The resulting graph has all its edges terminated either by Clamp or via a placeholder. Here is the code, which is rather simple:

using ForneyLab
using Plots

gA = 3
gB = 5
gC = 8

# Three deterministic functions
function grade(i, d)
    if i == 0 && d == 0
        g1,g2,g3 = [0.3, 0.4, 0.3]
    elseif i == 0 && d == 1
        g1,g2,g3 = [0.05, 0.25, 0.7]
    elseif i == 1 && d == 0
        g1,g2,g3 = [0.9, 0.08, 0.02]
    else
        g1,g2,g3 = [0.5, 0.3, 0.2]
    end
    
    return [g1, g2, g3]
end

function letter(g)
    if g == gA
        l = .9
    elseif g == gB
        l = .6
    else
        l = 0.01
    end
    return l
end

function score(i)
    if i == 0
        s = 0.05
    else
        s = 0.8
    end
    return s
end

# The model
gr = FactorGraph()

# @RV b1 ~ Clamp(0.4)
# @RV b2 ~ Clamp(0.3)
@RV d ~ Bernoulli(0.4)
@RV i ~ Bernoulli(0.3)
@RV grade_ ~ Nonlinear{Sampling}(i, d, g=grade, n_samples=50)
@RV g ~ Categorical(grade_)
@RV letter_ ~ Nonlinear{Sampling}(g, g=letter, n_samples=50)
@RV l ~ Bernoulli(letter_)
@RV score_ ~ Nonlinear{Sampling}(i, g=score, n_samples=50)
@RV s ~ Bernoulli(score_)

# placeholder(d, :d)
placeholder(i, :i)
placeholder(s, :s)
placeholder(l, :l)
placeholder(g, :g)
#  The line the produces the error: 
algo = messagePassingAlgorithm([d])    # <<<<< ERROR
source_code = algorithmSourceCode(algo)
eval(Meta.parse(source_code));

The error produced is:

No applicable SumProductRule{Categorical} update for Categorical node with inbound types: Message{SampleList}, Nothing

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] inferUpdateRule!(entry::ScheduleEntry, rule_type::Type{SumProductRule{Categorical}}, inferred_outbound_types::Dict{Interface, Type})
   @ ForneyLab ~/src/2022/ForneyLab.jl/src/algorithms/sum_product.jl:75
 [3] inferUpdateRules!(schedule::Vector{ScheduleEntry}; inferred_outbound_types::Dict{Interface, Type})
   @ ForneyLab ~/src/2022/ForneyLab.jl/src/message_passing.jl:203
 [4] messagePassingSchedule(pf::PosteriorFactor)
   @ ForneyLab ~/src/2022/ForneyLab.jl/src/algorithms/posterior_factor.jl:75
 [5] messagePassingAlgorithm(target_variables::Vector{Variable}, pfz::PosteriorFactorization; ep_sites::Vector{Tuple}, id::Symbol, free_energy::Bool)
   @ ForneyLab ~/src/2022/ForneyLab.jl/src/algorithms/inference_algorithm.jl:82
 [6] messagePassingAlgorithm (repeats 2 times)
   @ ~/src/2022/ForneyLab.jl/src/algorithms/inference_algorithm.jl:58 [inlined]
 [7] top-level scope
   @ In[31]:1
 [8] eval
   @ ./boot.jl:373 [inlined]
 [9] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

Clearly, this means that there is no SumProduct rule to handle categorical variables with a SampleMessage Input. So my question is: how is this case treated? Would a straightforward variation technique be appropriate? Thanks.

The text was updated successfully, but these errors were encountered:

albertpod · 2022-05-29T15:49:02Z

Hi @erlebach!

I am not sure I understand your model. Can you provide the equations for your generative model? Or you can try to explain what are you trying to model with this graph? It seems strange to me that you have a shared placeholder for the output of Bernoulli distribution and the input of a Nonlinear node (e.g. placeholder(i, :i))

First of all, indeed there is no close form solution (aka sp message) in ForneyLab.jl with these incoming messages, i.e. Message{SampleList}, Nothing.

To circumvent the problem you need to introduce a recognition distribution ("variation technique") around the Categorical node. In the code below I've added one more line before algorithm construction:

# The model
gr = FactorGraph()

# @RV b1 ~ Clamp(0.4)
# @RV b2 ~ Clamp(0.3)
@RV d ~ Bernoulli(0.4)
@RV i ~ Bernoulli(0.3)
@RV grade_ ~ Nonlinear{Sampling}(i, d, g=grade, n_samples=50)
@RV g ~ Categorical(grade_)
@RV letter_ ~ Nonlinear{Sampling}(g, g=letter, n_samples=50)
@RV l ~ Bernoulli(letter_)
@RV score_ ~ Nonlinear{Sampling}(i, g=score, n_samples=50)
@RV s ~ Bernoulli(score_)

# placeholder(d, :d)
placeholder(i, :i)
placeholder(s, :s)
placeholder(l, :l)
placeholder(g, :g)
#  The line the produces the error: 

q = PosteriorFactorization(g, grade_, ids=[:g, :grade_]) 
algo = messagePassingAlgorithm([d])    # <<<<< ERROR

However this code won't work either, because one of your recognition factors g is clamped (connected to the placeholder(g, :g))

What you can do, is to remove placeholder(g, :g) to build the algorithm.

# The model
gr = FactorGraph()

# @RV b1 ~ Clamp(0.4)
# @RV b2 ~ Clamp(0.3)
@RV d ~ Bernoulli(0.4)
@RV i ~ Bernoulli(0.3)
@RV grade_ ~ Nonlinear{Sampling}(i, d, g=grade, n_samples=50)
@RV g ~ Categorical(grade_)
@RV letter_ ~ Nonlinear{Sampling}(g, g=letter, n_samples=50)
@RV l ~ Bernoulli(letter_)
@RV score_ ~ Nonlinear{Sampling}(i, g=score, n_samples=50)
@RV s ~ Bernoulli(score_)

# placeholder(d, :d)
placeholder(i, :i)
placeholder(s, :s)
placeholder(l, :l)
# placeholder(g, :g)
#  The line the produces the error: 

q = PosteriorFactorization(g, grade_, ids=[:g, :grade_]) 
algo = messagePassingAlgorithm([d])    # <<<<< ERROR
source_code = algorithmSourceCode(algo)
eval(Meta.parse(source_code));

THIS WORKS!
However you might encounter other issues later. Hence, I would first suggest we discuss your model and then think of the inference.

erlebach · 2022-05-29T20:54:09Z

Hi @albertpod:
Thank you for your investigation and suggestion, which I will try out. Let me provide you with some context. This problem was proposed as a class demonstration of Probability Graphical Models. The objective is to understand the relationships between intelligence (I), SAT scores (S), problem difficulty (D), test grade (G), and whether the student receives a letter of recommendation (L). Here is the proposed model:

P(I,S,D,G,L) = P(I) P(D) P(G|D,I) P(L|G) P(S|I)

P(I) and P(D) are modeled as Bernoulli distributions. P(G|D,I) is modeled as a categorical distribution (we assume three different grades). The categorical probabilities depend on the inputs (D,I), so there are four different categorical triplets (whose elements sum to 1). P(L|G) and P(S|I) are Bernoulli distributions whose probability parameter depends on their condition variables. So each has two probabilities. These last three probabilities are captured through the functions letter, score, and grade. The former two can be interpreted as mixtures of Bernoulli, and grade is a mixture of categorical distributions (I believe that the TransitionMixture factor could be of help but I have not looked into it.

I was hoping initially to be able to compute marginals on any variable not observed without appealing to variational inference. I wished to experiment with choosing different subsets of observed variables and examine their effect. Furthermore, I want to be able to specify which variables are observed via a dictionary, passed to some function, and have everything the appropriate inference executed. It should be noted that I am only providing a single observation at this stage, and the various distributions of the nodes are specified, so an analytical solution is possible. By analytical, I mean the following. The marginal of I is a Bernoulli with parameter p=0.3. If I specify that s=1 for example, the I variable becomes a Bernoulli with a different parameter, which is increased. This parameter can be computed exactly. I can also perform a Monte-Carlo simulation to compute it. This is what I did in PyMC. I thought that forney lab would be able to apply expectation propagation to do this, since it can be done manually (with pencil and paper). I do realize that if I place a prior on the probability of intelligence, then its posterior can be evaluated via variational inference, and clarity ensues. But that was not my initial intent.

Note that I might want to observe g. What does one do in that case? In reality, we can consider the current model to have a plate with N measurements, and I am considering the case of N=1, i.e., the simplest case. That should be possible, no?

I have implemented the model successfully in pymc, if that is of that is of interest to you. These implementations serve as exercises to develop expertise in a variety of frameworks in Julia and Python.

For reference, I have never heard the term recognition distribution before.

erlebach · 2022-05-29T21:40:39Z

How about we move this to the discussions?

albertpod · 2022-05-30T11:12:17Z

Sorry, I noticed the new topic on Discussion earlier than I read your answer @erlebach.

As for recognition distribution q. This term is also referred to as instrumental distribution. This term arises when we approximate our posterior distribution p with some reference probability distribution q (recognition distribution). In this case, we try to minimize KL divergence between p and q as log evidence can't be computed.

We had a short discussion with @ThijsvdLaar about your issue. It seems like you can be just fine with TransitionMixture for this problem. @ThijsvdLaar will get back to you regarding this.

I understand your wish to observe g, but then (to my understanding) your graph will change as well, i.e., you won't need P(G|D, I).

In the meantime, I will try to come up with the inference solution for your model using a Nonlinear node.

erlebach · 2022-05-30T11:58:10Z

Thank you! Quick question on something I never understood. If a variable is observed, its probability distribution is quite irrelevant, is that not true? However, when estimating a likelihood, its probability distribution still impacts the posterior through the remaining non-observed variables. If true, P(G|D, I) is still needed. In any case, ideally, I would like, if possible, the flexibility of observing or not observing any subset of random variables without having to write different methods for each combination of observables. Could't an observable be implemented in a general manner through multiplication by a Delta function? This is possible with MCMC, but perhaps not with message passing.

albertpod · 2022-05-30T14:35:41Z

I am not sure I've understood your question. What do you mean by irrelevant distribution? (let's continue this part at the Discussion; it is always helpful to draw a graph and inspect how the messages flow depending on your graph structure)

As for the solution with a Nonlinear node. It seems like we have found a bug that doesn't allow us to proceed with the mix of Bernoulli and Categorical distributions. However, Bernoulli can always be converted to a Categorical distribution, e.g. Bernoulli(0.3) <-> Categorical([0.3, 0.7]) . Given that I rewrote your model in the following way:

using ForneyLab
using ProgressMeter

gA = 3
gB = 5
gC = 8

# Three deterministic functions
function grade(i, d)
    if i == 0 && d == 0
        g1,g2,g3 = [0.3, 0.4, 0.3]
    elseif i == 0 && d == 1
        g1,g2,g3 = [0.05, 0.25, 0.7]
    elseif i == 1 && d == 0
        g1,g2,g3 = [0.9, 0.08, 0.02]
    else
        g1,g2,g3 = [0.5, 0.3, 0.2]
    end
    
    return [g1, g2, g3]
end

function letter(g)
    if g == gA
        l[1] = .9
    elseif g == gB
        l = .6
    else
        l = 0.01
    end
    return [l, 1-l] # changed to comply with Catergorical distribution
end

function score(i)
    if i[1] == 0
        s = 0.05
    else
        s = 0.8
    end
    return [s, 1-s] # changed to comply with Catergorical distribution
end

# The model
gr = FactorGraph()

@RV d ~ Categorical([0.4, 0.6]) # equivalent to Bernoulli(0.4)
@RV i ~ Categorical([0.3, 0.7]) # equivalent to Bernoulli(0.3)
@RV grade_ ~ Nonlinear{Sampling}(i, d, g=grade, n_samples=50, dims=[(3,),(2,),(2,)]) # we need to specify the dimensionality for each interface, i.e. grade_, i, d
@RV g ~ Categorical(grade_)
@RV letter_ ~ Nonlinear{Sampling}(g, g=letter, n_samples=50, dims=[(2,),(2,),(2,)])
@RV l ~ Categorical(letter_)
@RV score_ ~ Nonlinear{Sampling}(i, g=score, n_samples=50, dims=[(2,),(2,),(2,)])
@RV s ~ Categorical(score_)

placeholder(s, :s, dims=(2,))
placeholder(l, :l, dims=(2,))


q = PosteriorFactorization(g, grade_, ids=[:G, :Grade]) 
algo = messagePassingAlgorithm([d])
source_code = algorithmSourceCode(algo)
eval(Meta.parse(source_code))

# Prior statistics

vmp_its = 100
data = Dict(:s => [1.0, 0.0], :l=>[0.0, 1.0])


marginals = Dict{Any, Any}(:g => ProbabilityDistribution(Univariate, Categorical, p=one(3)/3))

messages = initGrade()
@showprogress for i in 1:vmp_its
    stepGrade!(data, marginals, messages)
    stepG!(data, marginals, messages)
end

println(ForneyLab.unsafeMean(marginals[:i]))
println(ForneyLab.unsafeMean(marginals[:d]))
println(ForneyLab.unsafeMean(marginals[:g]))

As a side note, at the moment, our lab focuses on ReactiveMP.jl. Only three people can tackle ForneyLab issues (2 of those write theses, and one will be unavailable for a few days). As a consequence, our replies will be slow, but they will come sooner or later :) Thanks for trying FL in this setting; it helps to improve the framework.

erlebach · 2022-05-30T15:03:44Z

Thanks! One question: you wrote:

@RV letter_ ~ Nonlinear{Sampling}(g, g=letter, n_samples=50, dims=[(2,),(2,),(2,)])

Shouldn't this be written as

@RV letter_ ~ Nonlinear{Sampling}(g, g=letter, n_samples=50, dims=[(2,),(2,)])

I am surprised this would have worked for you.

albertpod · 2022-05-30T15:09:10Z

@erlebach ah, right, I made a typo. Your variant is correct.

FL just ignored the last dimensionality specification. We should throw an error or warning when a user does it.

erlebach · 2022-05-30T15:29:43Z

Great. I ran the code, and it does not work. But I have a question. You wrote:

data = Dict(:s =>[1.0, 0.0], :l => [0.0, 1.0])

Why is that? When sampled, a Categorical returns a single number (0,1), or (1,2)? Why are you specifying a vector of length 2? Or are you specifying two observed values? I just want to make sure since in my original problem, I only assigned one observed value.

When executing stepGRADE, I get the error:

TypeError: in keyword argument m, expected Number, got a value of type Vector{Float64}.

Looking more closely, the error is in ProgressMeter. I removed Progressmeter, and still get the same error.
More specifically:

TypeError: in keyword argument m, expected Number, got a value of type Vector{Float64}

Stacktrace:
 [1] stepGRADE!(data::Dict{Symbol, Vector{Float64}}, marginals::Dict{Any, Any}, messages::Vector{Message})
   @ Main ./none:17
 [2] top-level scope
   @ ./In[69]:3
 [3] eval
   @ ./boot.jl:373 [inlined]
 [4] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

albertpod · 2022-05-30T15:38:00Z

I have a feeling that you haven't specified the dimensionality of your placeholder as I, i.e. placeholder(s, :s, dims=(2,)).
Are you sure you are using exactly the same code?

As for data, the Categorical node in FL defines a one-dimensional probability distribution over the normal basis vectors of dimension d. The output is one-hot encoded, meaning that for 1 we use (1,0), and for 2 - (0,1), etc.

erlebach · 2022-05-30T15:59:57Z

You are correct, @albertpod ; I had not specified the dimensionality of the placeholders. I did not copy/paste your code, but instead modified mine. That is how I learn faster.

I did not realize that a Categorical was one-hot encoded. Makes sense (different than PyMC).

The code now runs. Whether it runs correctly is another matter. I am surprised that a 100 iterations of a variational method runs slower than PyMC running 5,000 iterations of a Monte-Carlo sampler. Of course, the variational approach provides a parametrized distribution. And I have not been careful to make precise measurements. I guess the answer is because I am using the nonlinear sampler three times in my Model. :-)

Why do you use unsafeMean rather than mean()? I cannot find any documentation on unsafeMean. What does it do?

albertpod · 2022-05-30T16:16:23Z

I understand, no worries.

Well, actually you also use "sampling". ForneyLab knows how to combine different message passing techniques. Here you have used hybrid message passing, i.e. SP, VMP, and Sampling-based (around a nonlinear node). It's hard to compare the iterations from one package to another, but I am also surprised by your observation.

I would be also suspicious about the correctness of the result. Unfortunately, for this model, we cannot inspect the free energy (our minimization target). So we don't know when the algorithm converges: within 1, 2,3, or 100 iterations. Maybe you can check how your marginals change over iterations.

We still need to check a few things on our side. Perhaps, we'll come up with a better solution.

As for unsafeMean, that's just a habit (old school FL), you can safely use mean now.

erlebach · 2022-05-30T16:37:58Z

All good points. Actually, I am not convinced of convergence either. Marginals look fine. To check my marginals over iterations, I would have to output data using a callback function. Is that possible?

I can also create a latex document more clearly explaining what I am doing at this point. You'll note that I have not created a prior for the various probabilities that I am "stating" when stating my model. So I am not doing inference in the normal sense of the word. I have a joint distribution that is known and am simply fixing some parameter values.

By the way, when fixing the data as :s => [1., 0.], that is as expected since the output of a Bernoulli is [1,0] or [0,1]. What happens if I set my data to [0,0] or [0.5, 0.5]? I would not expect [0,0] to work since the elements do not sum to 1. [0.5, 0.5] probably won't work since my samples can only be [0,1] or [1,0]. At least that my expectation when using Bernoulli's.

If you have any experience with PyMC, I can send you my code (which works) and then you can get a better idea about what I am doing. Or I can email you a pdf that explains more in detail.

albertpod · 2022-06-01T18:11:38Z

Hi @erlebach!

Here's pseudocode for tracking your marginals:

marginals = Dict{Any, Any}(:g => ProbabilityDistribution(Univariate, Categorical, p=one(3)/3))
marginal_storage = Dict{Any, Any}(:g => [], :i => [], :d => []) # or anything else you are interested in
messages = initGrade()
@showprogress for i in 1:vmp_its
    marginals = stepGrade!(data, marginals, messages)
    push!(marginal_storage[:i], marginals[:i])
    push!(marginal_storage[:d], marginals[:d])
    marginals = stepG!(data, marginals, messages)
    push!(marginal_storage[:g], marginals[:g])
end

Having a rigorous explanation of the model will be helpful, but I am afraid we won't have much time to look into that atm.

As for data, in FL, the Categorical node does not care how you encode the observation. For example, you can say :s => [100.0, 20.0]. Given this observation, the node will compute a backward message. In the case of Categorical likelihood, the backward message will turn out to be Dirichlet, which is parametrized by concentration parameters (and categories number). The only constraint on the concentration parameters is that they are greater than 0. Given :s => [100.0, 20.0] the message carrying Dirichlet distribution will have concentration parameters 101 and 21. If you don't have a good intuition about the concentration parameters of Dirichlet, I recommend checking the wiki. The point is, however, does observation [100.0, 20.0] encode your actual observations? Maybe. But I would rather go with one-hot encoding. When mapping Bernoulli output to Categorical you can say that 0.0 <-> [1.0, 0.0], 1.0 <-> [0.0, 1.0].

I recommend inspecting the generated code:
println(source_code)
Drawing FG also helps:
draw()

P.S. Changing Bernoulli distributions to Categorical is a temporary solution. Your issue indicated that there is a bug in FL. We will start working on it soon.

erlebach · 2022-06-02T14:12:25Z

Thank you for this explanation. I will try all this out!

albertpod · 2022-08-25T14:00:24Z

@erlebach hi!
Not sure if this is still relevant, but ReactiveMP.jl has released message-passing with a boolean function.

You may want to look at this demo.

albertpod assigned albertpod and semihakbayrak May 28, 2022

albertpod assigned ThijsvdLaar May 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Discrete Bayesian Graph #210

A Discrete Bayesian Graph #210

erlebach commented May 27, 2022 •

edited

albertpod commented May 29, 2022 •

edited

erlebach commented May 29, 2022 •

edited

erlebach commented May 29, 2022

albertpod commented May 30, 2022

erlebach commented May 30, 2022

albertpod commented May 30, 2022

erlebach commented May 30, 2022

albertpod commented May 30, 2022

erlebach commented May 30, 2022

albertpod commented May 30, 2022

erlebach commented May 30, 2022

albertpod commented May 30, 2022 •

edited

erlebach commented May 30, 2022 •

edited

albertpod commented Jun 1, 2022 •

edited

erlebach commented Jun 2, 2022

albertpod commented Aug 25, 2022

A Discrete Bayesian Graph #210

A Discrete Bayesian Graph #210

Comments

erlebach commented May 27, 2022 • edited

albertpod commented May 29, 2022 • edited

erlebach commented May 29, 2022 • edited

erlebach commented May 29, 2022

albertpod commented May 30, 2022

erlebach commented May 30, 2022

albertpod commented May 30, 2022

erlebach commented May 30, 2022

albertpod commented May 30, 2022

erlebach commented May 30, 2022

albertpod commented May 30, 2022

erlebach commented May 30, 2022

albertpod commented May 30, 2022 • edited

erlebach commented May 30, 2022 • edited

albertpod commented Jun 1, 2022 • edited

erlebach commented Jun 2, 2022

albertpod commented Aug 25, 2022

erlebach commented May 27, 2022 •

edited

albertpod commented May 29, 2022 •

edited

erlebach commented May 29, 2022 •

edited

albertpod commented May 30, 2022 •

edited

erlebach commented May 30, 2022 •

edited

albertpod commented Jun 1, 2022 •

edited