Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Achieve constraints via return of Inf #82

Open
Maximilian-Stefan-Ernst opened this issue Jun 27, 2022 · 3 comments
Open

[question] Achieve constraints via return of Inf #82

Maximilian-Stefan-Ernst opened this issue Jun 27, 2022 · 3 comments

Comments

@Maximilian-Stefan-Ernst
Copy link

Maximilian-Stefan-Ernst commented Jun 27, 2022

If you don't want questions in your issues, please just delete.

We have complicated constraints that can not (or at least it is not obvious how to) be enforced via a proximal operator. However, with Optim.jl we've had good experiences just returning Inf whenever the constraints are not met (and there are some theoretical reasons why this should work as well as a lot of practical experience).

Using ProximalAlgorithms, this strategy leads to convergence problems. Do you know if ProximalAlgorithms handles Inf values differently than Optim? Is there any advantage of returning floatmax(...) or a very high number instead of Inf?

@lostella
Copy link
Member

@Maximilian-Stefan-Ernst I'm not sure how Optim deals with Inf values. From what you say, I'm guessing you are trying to deal with constraints in some smooth term (having it return Inf when some condition is met?), on which algorithms will evaluate the gradient: I'm not sure how automatic differentiation deals with that situation, but I can see it may have issues with it (for a function to be differentiable at a point, it must be finite there).

Can you give an example of what you tried?

@Maximilian-Stefan-Ernst
Copy link
Author

Maximilian-Stefan-Ernst commented Jun 28, 2022

Yes! Okay, so our objective is of the form x -> logdet(Σ(x)) + tr(inv(Σ(x))), where x is a real valued vector of parameters, and Σ is an intermediate matrix constructed from the parameters, that has to be positive definite. The final objective is then a function of Σ, i.e. logdet(Σ) + tr(inv(Σ)).

So what works fine for us with Optim and NLopt is just to check if Σ(x) is positive definite and just return Inf if that is not the case. An MWE using automatic differentiation looks like this:

using Zygote, Optim, ForwardDiff, ProximalAlgorithms, ProximalOperators, LinearAlgebra, ProximalCore

function f(x)

    # in reality, Σ is a more complicated function of the parameters
    Σ = [x[1] 0.0
         0.0 x[2]]

    # Since Σ is diagonal, it is positive definite iff x[1] > 0, x[2] > 0
    if !isposdef(Σ)
        return Inf
    else
        return logdet(Σ) + tr(inv(Σ))
    end

end

# works just fine
sol = Optim.optimize(f, [2.0, 2.0], BFGS(); autodiff = :forward)

# finds the correct values of [1.0; 1.0]
sol.minimizer

# the same with ProximalAlgorithms
ProximalAlgorithms.PANOC()(x0 = [2.0, 2.0], f = f, g = NormL1(0.0))
# ERROR: MethodError: Cannot `convert` an object of type Nothing to an object of type Float64

# The reason seems to be that different autodiff backends handle inadmissible values differently
ForwardDiff.gradient(f, [-1.0, -1.0])
# returns [0.0; 0.0]

Zygote.gradient(f, [-1.0, -1.0])
# returns (nothing,)

In reality, for performance reasons, we give analytical gradients instead. For inadmissible parameter values, we just fill the gradient with ones (i.e. anything that does not fulfill Optim's convergence criteria on the gradient norm).

However, this fails for ProximalAlgorithms:

struct myproblem end

function ProximalCore.gradient!(grad, f::myproblem, x)

    Σ = [x[1] 0.0
         0.0 x[2]]

    if !isposdef(Σ)
        grad .= 1.0
        return Inf
    else
        ∇Σ = [1.0 0.0
              0.0 0.0
              0.0 0.0
              0.0 1.0]
        grad' .= vec(inv(Σ) - inv(Σ)^2)'*∇Σ
        return logdet(Σ) + tr(inv(Σ))
    end

end

ProximalAlgorithms.PANOC()(x0 = [2.0, 2.0], f = myproblem(), g = NormL1(0.0))

# output: ([-6.39742852169722e17, -6.39742852169722e17], 11)

but it starts to work again if we return floatmax(...) instead of Inf:

function ProximalCore.gradient!(grad, f::myproblem, x)

    Σ = [x[1] 0.0
         0.0 x[2]]

    if !isposdef(Σ)
        grad .= 1.0
        return floatmax(eltype(x)) # <- only this line changed
    else
        ∇Σ = [1.0 0.0
              0.0 0.0
              0.0 0.0
              0.0 1.0]
        grad' .= vec(inv(Σ) - inv(Σ)^2)'*∇Σ
        return logdet(Σ) + tr(inv(Σ))
    end

end

ProximalAlgorithms.PANOC()(x0 = [2.0, 2.0], f = myproblem(), g = NormL1(0.0))
# output: ([0.9999999997915161, 0.9999999997915161], 7)

Sorry for the long post, but I could not find an easier MWE that reproduces the problem.

@lostella
Copy link
Member

lostella commented Jul 6, 2022

Sorry for the long post, but I could not find an easier MWE that reproduces the problem.

Not at all, thanks for the detailed report! Leaving aside the AD issues, PANOC does a line-search internally, which I think may severely break in case the function evaluates to Inf; there’s a pretty strong requirement on f being real-valued (and actually smooth) to guarantee convergence.

The PANOCplus variant may work better in case f is simply differentiable (but then again, the trick being used here really make it non-differentiable). Otherwise ForwardBackward is free from line-searches, in case you manually provide a step-size, but then this should be small enough compared to the Lipschitz constant of the gradient of f.

To summarize, I’m not surprised if things break in your case 🙂 and now I’m curious how Optim deals with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants