Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDDP.jl example #98

Open
slwu89 opened this issue Apr 26, 2023 · 30 comments
Open

SDDP.jl example #98

slwu89 opened this issue Apr 26, 2023 · 30 comments

Comments

@slwu89
Copy link
Contributor

slwu89 commented Apr 26, 2023

Hi @sdwfrost, I was interested to see if I could get something similar to your InfiniteOpt.jl example running using SDDP.jl. I got this far, but the solvers keep spitting out numerical issues. I'm not familiar with the paper that your InfiniteOpt example was based off, do you see anything obviously wrong with my model here?

using SDDP, JuMP, Ipopt
using Plots
using Random
using BenchmarkTools

tmax = 40.0
tspan = (0.0,tmax);

δt = 0.1
t = 0:δt:tmax;
nsteps = Int(tmax / δt);

u0 = [990,10,0]; # S,I,R
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0; # maximum cost

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = Ipopt.Optimizer,
) do sp, t

    @variable(sp, 0  S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0  I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0  R, SDDP.State, initial_value = u0[3])

    @variable(sp, 0  C, SDDP.State, initial_value = 0)

    @variable(sp, 0  υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0  υ  υ_max)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in  υ_total)

    # state updating rules
    @NLconstraint(sp, S.out == S.in - ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in))
    @NLconstraint(sp, I.out == I.in + ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in) - ((1-exp(-γ*δt)) * I.in))
    @NLconstraint(sp, R.out == R.in + ((1-exp(-γ*δt)) * I.in))
    @NLconstraint(sp, C.out == C.in + ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in))

    @stageobjective(sp, C.out)
end

SDDP.train(model; iteration_limit = 10)
@sdwfrost
Copy link
Contributor

Hi, the only thing that is majorly different is that the initial conditions are not the same - you would have to scale beta in order to have a population size of 1000, but this wasn't the problem. I also tried putting the upper bound in the variable declaration rather than as a constraint:

@variable(sp, 0  υ_cumulative  υ_total, SDDP.State, initial_value = 0)

but that didn't fix things either - I'm not sure whether this is the correct approach in SDDP.jl anyway.

There's an additional x variable in the JSON file that is written to the working directory, but I don't know whether that's a bad thing or not. Here's my stripped down version that more closely matches the InfiniteOpt.jl one:

using SDDP, JuMP, Ipopt

tmax = 40.0
δt = 0.1
nsteps = Int(tmax / δt)

u0 = [0.99,0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = Ipopt.Optimizer,
) do sp, t

    @variable(sp, 0  S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0  I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0  C, SDDP.State, initial_value = 0)

    @variable(sp, 0  υ_cumulative, SDDP.State, initial_value = 0)
    #@variable(sp, 0 ≤ υ_cumulative ≤ υ_total, SDDP.State, initial_value = 0)
    @variable(sp, 0  υ  υ_max)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in  υ_total)

    # state updating rules
    @NLconstraint(sp, S.out == S.in - ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in))
    @NLconstraint(sp, I.out == I.in + ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in) - ((1-exp(-γ*δt)) * I.in))
    @NLconstraint(sp, C.out == C.in + ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in))

    @stageobjective(sp, C.out)
end

SDDP.train(model; iteration_limit = 10)

I'd ping this to the Julia discourse in the Mathematical Optimization thread, and if that doesn't work, try filing an issue in case it really is a corner case.

@sdwfrost
Copy link
Contributor

I noticed that you had some errors in the transitions (you need to use the negative of the rate in the (1-exp(-rt)) bits, and you also included S.in in the force of infection (where it should just be I.in). I also added a warm state to the control variable. Fixes below - I still get an error though.

using SDDP, JuMP, Ipopt

tmax = 40.0
δt = 0.1
nsteps = Int(tmax / δt)

u0 = [0.99, 0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = Ipopt.Optimizer,
) do sp, t

    @variable(sp, 0  S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0  I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0  C, SDDP.State, initial_value = 0)

    @variable(sp, 0  υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0  υ  υ_max, start = 0)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in  υ_total)

    # expressions to simplify the state updates
    @NLexpression(sp, infection, (1-exp(-(1 - υ) * β * I.in * δt)) * S.in)
    @NLexpression(sp, recovery, (1-exp(-γ*δt)) * I.in)

    # state updating rules
    @NLconstraint(sp, S.out == S.in - infection)
    @NLconstraint(sp, I.out == I.in + infection - recovery)
    @NLconstraint(sp, C.out == C.in + infection)

    @stageobjective(sp, C.out)
end

SDDP.train(model; iteration_limit = 10)

@sdwfrost
Copy link
Contributor

The above breaks at step 202. If you reduce the simulation time to e.g. tmax=20.0 (ie 200 steps), then the model can go through an iteration loop. Any idea how you get a vector of υ? I'm not sure why there is a problem at this stage.

using SDDP, JuMP, Ipopt

tmax = 20.0
δt = 0.1
nsteps = Int(tmax / δt)

u0 = [0.99, 0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = Ipopt.Optimizer,
) do sp, t

    @variable(sp, 0  S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0  I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0  C, SDDP.State, initial_value = 0)

    @variable(sp, 0  υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0  υ  υ_max, start = 0)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in  υ_total)

    # expressions to simplify the state updates
    @NLexpression(sp, infection, (1-exp(-(1 - υ) * β * I.in * δt)) * S.in)
    @NLexpression(sp, recovery, (1-exp(-γ*δt)) * I.in)

    # state updating rules
    @NLconstraint(sp, S.out == S.in - infection)
    @NLconstraint(sp, I.out == I.in + infection - recovery)
    @NLconstraint(sp, C.out == C.in + infection)

    @stageobjective(sp, C.out)
end

SDDP.train(model; iteration_limit = 10)
rule = SDDP.DecisionRule(model; node = 1)
solution = SDDP.evaluate(rule; incoming_state=Dict(:C => 0.0))

@sdwfrost
Copy link
Contributor

In the deterministic case, we can just use JuMP, as explained in this tutorial, which may help to debug whether it's a JuMP issue or an SDDP.jl issue (though I see where you're going with this :)).

@slwu89
Copy link
Contributor Author

slwu89 commented Apr 26, 2023

Yeah that's a good idea, maybe it would be nice in any case to have a JuMP -> deterministic SDDP -> stochastic SDDP workflow anyway. I'll play around with the vanilla JuMP model and see if it sheds any light on what's going wrong with the SDDP implementation

@slwu89
Copy link
Contributor Author

slwu89 commented Apr 26, 2023

@sdwfrost I didn't notice your above comment. To get the state trajectory after running your last code block (to time 20), we can do this:

using Plots

sims = SDDP.simulate(model, 1, [:S,:I])
traj = [[sims[1][i][:S].in,sims[1][i][:I].in] for i in 1:200]
traj = transpose(hcat(traj...))
plot(traj)

To give something like :
Screenshot 2023-04-26 at 8 35 09 AM

@sdwfrost
Copy link
Contributor

Hi @slwu89! I added a deterministic JuMP.jl example here. There were a few issues, so I had to play a bit with number of steps and dt, but it now works well.

The analogous model in SDDP.jl doesn't work with the same parameter values and solver. I can run for a few iterations. For some reason, the algorithm in SDDP.jl tries a solution where υ is high at the beginning, and hence it reaches a point where it can't find a feasible solution before it's gone through all nsteps. (This is why it was failing around step 20 originally). Here's what I have so far:

using SDDP, JuMP, Ipopt, Plots

tmax = 100.0
δt = 1.0
nsteps = Int(tmax / δt)

u0 = [0.99, 0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost
opt=Ipopt.Optimizer

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = opt,
) do sp, t

    @variable(sp, 0  S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0  I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0  C, SDDP.State, initial_value = 0)

    @variable(sp, 0  υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0  υ  υ_max, start = 0)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.out  υ_total)

    # expressions to simplify the state updates
    @NLexpression(sp, infection, (1-exp(-(1 - υ) * β * I.in * δt)) * S.in)
    @NLexpression(sp, recovery, (1-exp(-γ*δt)) * I.in)

    # state updating rules
    @NLconstraint(sp, S.out == S.in - infection)
    @NLconstraint(sp, I.out == I.in + infection - recovery)
    @NLconstraint(sp, C.out == C.in + infection)

    @stageobjective(sp, C.out)
end

# The solver works when we convert to a JuMP model
# I don't know how to extract the values from this, however
det = SDDP.deterministic_equivalent(model, Ipopt.Optimizer)
set_silent(det)
JuMP.optimize!(det)
JuMP.termination_status(det)

# The stability report is fine
SDDP.numerical_stability_report(model)
# The solver will fail at iteration 7
SDDP.train(model; iteration_limit = 6)
sims = SDDP.simulate(model, 1, [:S,:I,])
traj = [[sims[1][i][:S].in,sims[1][i][:I].in,sims[1][i][]] for i in 1:nsteps]
traj = transpose(hcat(traj...))
plot(traj)

I've tried some other NLP solvers, such as MadNLP, but they don't seem to overcome the 'early lockdown' scenario.

@slwu89
Copy link
Contributor Author

slwu89 commented May 2, 2023

@sdwfrost I found out what the problem was. The way I went about it was by reading the files that SDDP spits out when it dies (of the naming convention "subproblem_XX.mof.json") back into JuMP and then reading off the LaTeX equations to find out why the subproblem at that time point wasn't solvable.

sp_moi = JuMP.read_from_file("./subproblem_XX.mof.json")
JuMP.latex_formulation(sp_moi)

Based on this, it seems the constraint @constraint(sp, υ_cumulative.out ≤ υ_total) has to be rewritten as @constraint(sp, υ_cumulative.in + (δt * υ) ≤ υ_total). The reason is that the actual value of the decision variable υ wasn't being appropriately constrained over a time step.

I've put what I have so far in my fork: https://github.com/slwu89/sir-julia/blob/slwu89/sddp/tutorials/sddp/sddp.jl

Note that currently I only have a single @stageobjective, the cumulative infected at the final time point. This seems to me to be a closer interpretation of the JuMP and InfiniteOpt objectives, which only consider the final time point. Including all intermediate timepoints as we have previously leads to worse looking "optimal" polices, although even the "optimal" one returned in this example results in C of about 0.75 after 200 iterations of training, which is a lot worse than the real optimal policy. If you have any insights, I'd be grateful.

The strange looking policy resulting from training is below:

Screenshot 2023-05-01 at 6 38 46 PM

@sdwfrost
Copy link
Contributor

sdwfrost commented May 3, 2023

Hi @slwu89, I plugged your fix into the model, and it's nice it works. I think the problem is with the @stageobjective - even though you're minimizing the final size, before the final step, the use of 0 in the stageobjective means that essentially random policies are chosen at the beginning, until the policy budget reaches υ_total. I think that with a different problem that has a more local outcome, this approach will work better.

A 'wait and see' approach where you minimize the treatment until the final step waits too long:

if t < nsteps
        @stageobjective(sp, υ_cumulative.out)
    else
        @stageobjective(sp, C.out)
    end

I'll play around with more models to see if I can overcome the 'short sighted' nature of the current approach.

@slwu89
Copy link
Contributor Author

slwu89 commented May 3, 2023

Good points @sdwfrost, I'll play around with the objective too. This is a rather interesting topic, in terms of how to best set up local objectives that are able to get a policy close to the global ones that JuMP and InfiniteOpt are able to see.

@sdwfrost
Copy link
Contributor

sdwfrost commented May 3, 2023

The issue of sparse rewards comes up a lot in reinforcement learning, which causes problems as an algorithm finds it hard to work out which bits of the policy contributed to the outcome. Using a different objective is like reward shaping. I really wanted to put together an example using ReinforcementLearning.jl but got stuck trying to think about how to overcome the local/global problem.

@slwu89
Copy link
Contributor Author

slwu89 commented May 4, 2023

Well, it's an interesting problem, it may be useful to have the example even if we aren't able to solve it (ideal objective function) here, if it is still an open research question (at least with respect to optimizing for minimum infections in SIR)! I tried penalizing the objective function by adding a "control whiplash" term to avoid rapid changes in $\nu$, either by the absolute value of the change in cumulative control, or by its square (although not sure if using the @NLconstraint violates some assumptions required by SDDP):

@variable(sp, z)

# absolute value penalty
@constraint(sp, υ_cumulative.out - υ_cumulative.in  z)
@constraint(sp, -υ_cumulative.out - υ_cumulative.in  z)

# quadratic penalty
@NLconstraint(sp, (υ_cumulative.out - υ_cumulative.in)^2 == z)

In general I got smoother control policies, but still far from the optimal one found by JuMP/InfiniteOpt when we can do global optimization:

Screenshot 2023-05-04 at 8 21 27 AM

@sdwfrost
Copy link
Contributor

sdwfrost commented May 4, 2023

You can get something closer if you try to minimize a linear combination of the cumulative cost and the cumulative infections (i.e. trying to keep costs down and infections down, with a weighting factor).

@stageobjective(sp, υ_cumulative.out + 40*C.out)

A smoother penalty may not help, as the optimal includes two sharp changes.

@slwu89
Copy link
Contributor Author

slwu89 commented May 4, 2023

Good point. I was trying out that penalty to promote smoothness to cut down on the many spiky changes in policy I found in some of the other training runs, but yes it won't be able to handle the abrupt single lockdown policy very well.

@sdwfrost
Copy link
Contributor

sdwfrost commented May 7, 2023

Hi @slwu89 - do you want to commit a version of this model using the weighted sum as an objective? As you say, it's an interesting problem. I'm happy to work it up into a jmd if you're busy.

@slwu89
Copy link
Contributor Author

slwu89 commented May 8, 2023

Sure! I'll get a PR ready for you in a day or two. There's a lot of development going on in SDDP.jl, I get the feeling that we'll update this specific tutorial as new features get added to the package, it's exciting stuff.

@slwu89
Copy link
Contributor Author

slwu89 commented May 10, 2023

@sdwfrost I was hoping to get a stochastic example working for the PR but I don't think it's possible right now. In case things change in JuMP that make this possible (which the JuMP dev team is certainly thinking about, see jump-dev/MathOptInterface.jl#846), I'm going to post where I got to.

Because SDDP does not directly support random variables parameterized by state variables, we need to be able to feed in an exogenous source of noise to each node in the policy graph that we can transform into what we need. The most straightforward way to go about this would be to simulate an Euler-Maruyama discretization of an SDE.

Each step we first draw 2 realizations of normal random variates (as https://odow.github.io/SDDP.jl/stable/guides/add_multidimensional_noise/) and then add them to the constraint matrix to multiply the nonlinear expressions for the diffusion terms by those variates (following https://odow.github.io/SDDP.jl/stable/guides/add_noise_in_the_constraint_matrix/). The issue is that JuMP.set_normalized_coefficient only works for affine and quadratic expressions; perhaps that will be changed in the future, anyways, I leave it here.

model_stochastic = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = opt,
) do sp, t

    @variable(sp, 0  S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0  I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0  C, SDDP.State, initial_value = 0)

    @variable(sp, 0  υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0  υ  υ_max)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in + (δt * υ)  υ_total)

    # drift terms
    @NLexpression(sp, f_infection, (1 - υ) * β * I.in * S.in)
    @NLexpression(sp, f_recovery, γ * I.in)

    # diffusion terms
    @NLexpression(sp, g_infection, sqrt(f_infection))
    @NLexpression(sp, g_recovery, sqrt(f_recovery))

    # state updating rules
    @NLconstraint(sp, S_update, S.out == S.in - 1f_infection - 1g_infection)
    @NLconstraint(sp, I_update, I.out == I.in + 1f_infection - 1f_recovery + 1g_infection - 1g_recovery)
    @NLconstraint(sp, C_update, C.out == C.in + 1f_infection + 1g_infection)

    # sample the normal random variates
    Ω1 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
    Ω2 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
    Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
    SDDP.parameterize(sp, Ω) do ω
        # drift terms
        JuMP.set_normalized_coefficient(S_update, f_infection, δt) # if this doesn't work set f_infection to a variable and not an expression
        JuMP.set_normalized_coefficient(I_update, f_infection, δt)
        JuMP.set_normalized_coefficient(I_update, f_recovery, δt)
        JuMP.set_normalized_coefficient(C_update, f_infection, δt)
        # diffusion terms
        JuMP.set_normalized_coefficient(S_update, g_infection, ω.inf)
        JuMP.set_normalized_coefficient(I_update, g_infection, ω.inf)
        JuMP.set_normalized_coefficient(I_update, g_recovery, ω.rec)
        JuMP.set_normalized_coefficient(C_update, g_infection, ω.inf)
    end

    # linear weighting of objectives
    @stageobjective(sp, υ_cumulative.out + 40*C.out)

end

@slwu89 slwu89 mentioned this issue May 10, 2023
@sdwfrost
Copy link
Contributor

Thanks Sean! Here's hoping that the JuMP team add nonlinear expressions to set_normalized_coefficient.

@odow
Copy link

odow commented May 28, 2023

I'm about to merge the PR in MathOptInterface, when I noticed a linked issue with "SDDP.jl" in the title that piqued my interest. This is cool stuff!

You can work around the set_normalized_coefficient issue by making f_infection etc a decision variable that is constrained to the expression. I didn't test, but something like this should work:

model_stochastic = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = opt,
) do sp, t

    @variable(sp, 0  S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0  I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0  C, SDDP.State, initial_value = 0)

    @variable(sp, 0  υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0  υ  υ_max)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in + (δt * υ)  υ_total)

    # drift terms
    @variable(sp, f_infection)
    @NLconstraint(sp, f_infection == (1 - υ) * β * I.in * S.in)
    @variable(sp, f_recovery)
    @NLconstraint(sp, f_recovery == γ * I.in)

    # diffusion terms
    @variable(sp, g_infection)
    @NLconstraint(sp, g_infection == sqrt(f_infection))
    @variable(sp, g_recovery)
    @NLconstraint(sp, g_recovery == sqrt(f_recovery))

    # state updating rules
    @constraint(sp, S_update, S.out == S.in - 1f_infection - 1g_infection)
    @constraint(sp, I_update, I.out == I.in + 1f_infection - 1f_recovery + 1g_infection - 1g_recovery)
    @constraint(sp, C_update, C.out == C.in + 1f_infection + 1g_infection)

    # sample the normal random variates
    Ω1 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
    Ω2 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
    Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
    SDDP.parameterize(sp, Ω) do ω
        # drift terms
        JuMP.set_normalized_coefficient(S_update, f_infection, δt) # if this doesn't work set f_infection to a variable and not an expression
        JuMP.set_normalized_coefficient(I_update, f_infection, δt)
        JuMP.set_normalized_coefficient(I_update, f_recovery, δt)
        JuMP.set_normalized_coefficient(C_update, f_infection, δt)
        # diffusion terms
        JuMP.set_normalized_coefficient(S_update, g_infection, ω.inf)
        JuMP.set_normalized_coefficient(I_update, g_infection, ω.inf)
        JuMP.set_normalized_coefficient(I_update, g_recovery, ω.rec)
        JuMP.set_normalized_coefficient(C_update, g_infection, ω.inf)
    end

    # linear weighting of objectives
    @stageobjective(sp, υ_cumulative.out + 40*C.out)

end

@sdwfrost
Copy link
Contributor

Thanks @odow! Your tweak compiles, but gives an infeasible solution - @slwu89 , do you want to try to debug?

@sdwfrost sdwfrost reopened this May 29, 2023
@odow
Copy link

odow commented May 29, 2023

but gives an infeasible solution

A core assumption of SDDP.jl is relatively complete recourse. That is, there is always a feasible solution for any choice of the incoming state variables and the random variables.

I assume that the issue is

@constraint(sp, υ_cumulative.in + (δt * υ)  υ_total)

Use a penalized constraint like this (for some appropriate penalty cost):

@variable(sp, penalty >= 0)
@constraint(sp, υ_cumulative.in + (δt * υ)  υ_total + penalty)
@stageobjective(sp, υ_cumulative.out + 40*C.out + 1_000 * penalty)

@sdwfrost
Copy link
Contributor

Thanks - this may well be a problem down the line (and I had to play a bit with the deterministic version to overcome this constraint issue), but this is a model problem, as I get an infeasible solution on the first slice.

@odow
Copy link

odow commented May 29, 2023

It's potentially the non-differentiability of @NLconstraint(sp, g_infection == sqrt(f_infection)) at f_infection = 0. Ipopt uses first- and second-order derivatives during the solve.

What if you tweak the formulation to

model_stochastic = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = opt,
) do sp, t
    @variables(sp, begin
        0 <= S, SDDP.State, (initial_value = u0[1])
        0 <= I, SDDP.State, (initial_value = u0[2])
        0 <= C, SDDP.State, (initial_value = 0)
        0 <= υ_cumulative <= υ_total, SDDP.State, (initial_value = 0)
        0 <= υ <= υ_max
        ω_inf
        ω_rec
        penalty >= 0
        infection >= 0
        recovery >= 0
    end)
    @NLexpressions(sp, begin
        infection == δt * ((1 - υ) * β * I.in * S.in) + ω_inf * sqrt((1 - υ) * β * I.in * S.in)
        recovery == δt ** I.in) + ω_rec * sqrt* I.in)
    end)
    @constraints(sp, begin
        υ_cumulative.out == υ_cumulative.in + δt * υ - penalty
        S.out == S.in - infection
        I.out == I.in + infection - recovery
        C.out == C.in + infection
    end)
    D = Distributions.Normal(0, sqrt(δt))
    Ω1, Ω2 = sort(rand(D, 100)), sort(rand(D, 100))
    Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
    SDDP.parameterize(sp, Ω) do ω
        JuMP.fix(ω_inf, ω.inf)
        JuMP.fix(ω_rec, ω.rec)
    end
    @stageobjective(sp, υ_cumulative.out + 40 * C.out + 100 * penalty)
end

@slwu89
Copy link
Contributor Author

slwu89 commented May 30, 2023

Thanks for jumping in here and giving us some advice @odow! I just tried the most recent model formulation you posted, after calling SDDP.train it hangs upon starting the forward/backwards iterations (quit after ~20 mins).

@sdwfrost
Copy link
Contributor

@slwu89 - are you using the latest main of JuMP and SDDP? I can get @odow's first model running, it's just very slow. I'll report back later when it's gone through 10 iterations.

@slwu89
Copy link
Contributor Author

slwu89 commented May 30, 2023

Hi @sdwfrost, yeah, I'm on Julia 1.9 and have updated Ipopt/JuMP/SDDP to the latest versions.

@sdwfrost
Copy link
Contributor

sdwfrost commented May 30, 2023

This one is running for me (although at 80 min for the first iteration)

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = Ipopt.Optimizer,
) do sp, t
    @variables(sp, begin
        0 <= S, SDDP.State, (initial_value = u0[1])
        0 <= I, SDDP.State, (initial_value = u0[2])
        0 <= C, SDDP.State, (initial_value = 0)
        0 <= υ_cumulative <= υ_total, SDDP.State, (initial_value = 0)
        0 <= υ <= υ_max
        ω_inf
        ω_rec
        penalty >= 0
        infection >= 0
        recovery >= 0
    end)
    @NLexpressions(sp, begin
        infection == δt * ((1 - υ) * β * I.in * S.in) + ω_inf * sqrt((1 - υ) * β * I.in * S.in)
        recovery == δt ** I.in) + ω_rec * sqrt* I.in)
    end)
    @constraints(sp, begin
        υ_cumulative.out == υ_cumulative.in + δt * υ - penalty
        S.out == S.in - infection
        I.out == I.in + infection - recovery
        C.out == C.in + infection
    end)
    D = Distributions.Normal(0, sqrt(δt))
    Ω1, Ω2 = sort(rand(D, 100)), sort(rand(D, 100))
    Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
    SDDP.parameterize(sp, Ω) do ω
        JuMP.fix(ω_inf, ω.inf)
        JuMP.fix(ω_rec, ω.rec)
    end
    @stageobjective(sp, υ_cumulative.out + 40 * C.out + 100 * penalty)
end

@slwu89
Copy link
Contributor Author

slwu89 commented May 30, 2023

Thanks, that is where I am at too.

@odow
Copy link

odow commented May 30, 2023

although at 80 min for the first iteration

Now I take a closer look, the issue is that you have 100 * 100 samples for the Monte Carlo. And assuming

tmax = 100.0
δt = 1.0
nsteps = Int(tmax / δt);

then we need to solve approximately 1_000_000 nonlinear programs per iteration! This could be okay, except that JuMP currently rebuilds the model from scratch every time: jump-dev/JuMP.jl#1185. Problems like this are one reason that we're in the process of rewriting JuMP's nonlinear support.

I'd simplify the approximation with something like

D = Distributions.Normal(0, sqrt(δt))
Ω = [(inf = rand(D), rec = rand(D)) for _ in 1:100]

@slwu89
Copy link
Contributor Author

slwu89 commented Jun 1, 2023

Thanks for the helpful advice @odow, hah I see now that in the backwards pass that will generate a huge number of solves. I'll try to play around with some balance of fewer samples from the noise for each stage and time step size that gives reasonable solutions (dt=1 was too large and the approximation broke down).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants